Sunojlab / Transfer_Learning_in_Catalysis

Issue with Tokenizer #1

Open arupmondal835 opened 3 months ago

arupmondal835 commented 3 months ago

Hi, I am trying to reproduce this transfer learning code, however I am getting an error. I think this is because of the fastai version. When I am running tok=Tokenizer(partial(MolTokenizer, special_tokens = special_tokens), n_cpus=6, pre_rules=[], post_rules=[]), I am getting the following error:

AttributeError Traceback (most recent call last) File ~/.conda-envs/transfer/lib/python3.12/site-packages/IPython/core/, in, obj) 704 stream = StringIO() 705 printer = pretty.RepresentationPrinter(stream, self.verbose, 706 self.max_width, self.newline, 707 max_seq_length=self.max_seq_length, 708 singleton_pprinters=self.singleton_printers, 709 type_pprinters=self.type_printers, 710 deferred_pprinters=self.deferred_printers) --> 711 printer.pretty(obj) 712 printer.flush() 713 return stream.getvalue()

File ~/.conda-envs/transfer/lib/python3.12/site-packages/IPython/lib/, in RepresentationPrinter.pretty(self, obj) 408 return meth(obj, self, cycle) 409 if cls is not object \ 410 and callable(cls.dict.get('repr')): --> 411 return _repr_pprint(obj, self, cycle) 413 return _default_pprint(obj, self, cycle) 414 finally:

File ~/.conda-envs/transfer/lib/python3.12/site-packages/IPython/lib/, in _reprpprint(obj, p, cycle) 777 """A pprint that just redirects to the normal repr function.""" 778 # Find newlines and replace them with p.break() --> 779 output = repr(obj) 780 lines = output.splitlines() 781 with

File ~/.conda-envs/transfer/lib/python3.12/site-packages/fastai/text/, in Tokenizer.repr(self) 97 def repr(self) -> str: ---> 98 res = f'Tokenizer {} in {self.lang} with the following rules:\n' 99 for rule in self.pre_rules: res += f' - {}\n' 100 for rule in self.post_rules: res += f' - {}\n'

AttributeError: 'functools.partial' object has no attribute 'name'

What version of fastai did you specifically use? I am using 1.0.61.

Thanks, Arup

arupmondal835 commented 3 months ago

I have bypass this changing {} to {}. And now my tok is "Tokenizer MolTokenizer in en with the following rules:". Not sure if it is correct and what it should be.

However when I use this tok in the next stage as data = TextLMDataBunch.from_df(path, train_aug, valid_aug, bs=bs,tokenizer=tok, chunksize=50, text_cols=0, max_vocab=60000, include_bos=False), I am getting the following error: PicklingError: Can't pickle <class 'fastai.text.transform.Tokenizer'>: it's not the same object as fastai.text.transform.Tokenizer