datamol-io / molfeat

molfeat - the hub for all your molecular featurizers
https://molfeat.datamol.io
Apache License 2.0
187 stars 19 forks source link

Can't retrieve model ChemGPT-1.2B from the store! #109

Open hisplan opened 1 month ago

hisplan commented 1 month ago

Is there an existing issue for this?

Bug description

I've been trying to use ChemGPT-1.2B, but I'm getting this error Can't retrieve model ChemGPT-1.2B from the store !.

Just FYI, I have successfully used the following models. It appears that I'm having the issue with only this model ChemGPT-1.2B.

How to reproduce the bug

transformer = PretrainedHFTransformer(kind='ChemGPT-1.2B', notation='selfies', dtype=float)
features = transformer(smiles)

Error messages and logs

  0%|          | 0.00/736 [00:00<?, ?B/s]
  0%|          | 0/7 [00:00<?, ?it/s]
---------------------------------------------------------------------------
ModelStoreError                           Traceback (most recent call last)
File [~/miniconda3/envs/datamol/lib/python3.11/site-packages/molfeat/store/loader.py:100](http://localhost:8889/lab/tree/~/miniconda3/envs/datamol/lib/python3.11/site-packages/molfeat/store/loader.py#line=99), in PretrainedStoreModel._load_or_raise(cls, name, download_path, store, **kwargs)
     99     modelcard = store.search(name=name)[0]
--> 100     artifact_dir = store.download(modelcard, download_path, **kwargs)
    101 except Exception:

File [~/miniconda3/envs/datamol/lib/python3.11/site-packages/molfeat/store/modelstore.py:239](http://localhost:8889/lab/tree/~/miniconda3/envs/datamol/lib/python3.11/site-packages/molfeat/store/modelstore.py#line=238), in ModelStore.download(self, modelcard, output_dir, chunk_size, force)
    238     mapper.fs.delete(output_dir, recursive=True)
--> 239     raise ModelStoreError(
    240         f"""The destination artifact at {model_dest_path} has a different sha256sum ({cache_sha256sum}) """
    241         f"""than the Remote artifact sha256sum ({modelcard.sha256sum}). The destination artifact has been removed !"""
    242     )
    244 return output_dir

ModelStoreError: The destination artifact at [/Users/chunj/Library/Caches/molfeat/ChemGPT-1.2B/model.save](http://localhost:8889/Users/chunj/Library/Caches/molfeat/ChemGPT-1.2B/model.save) has a different sha256sum (4d8819f7c8c91ba94ba446d32f29342360d62971a9fa37c8cab2e31f9c3fc4c5) than the Remote artifact sha256sum (e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855). The destination artifact has been removed !

During handling of the above exception, another exception occurred:

ModelStoreError                           Traceback (most recent call last)
Cell In[6], line 1
----> 1 features = transformer(smiles)

File [~/miniconda3/envs/datamol/lib/python3.11/site-packages/molfeat/trans/base.py:384](http://localhost:8889/lab/tree/~/miniconda3/envs/datamol/lib/python3.11/site-packages/molfeat/trans/base.py#line=383), in MoleculeTransformer.__call__(self, mols, enforce_dtype, ignore_errors, **kwargs)
    359 def __call__(
    360     self,
    361     mols: List[Union[dm.Mol, str]],
   (...)
    364     **kwargs,
    365 ):
    366     r"""
    367     Calculate features for molecules. Using __call__, instead of transform.
    368     If ignore_error is True, a list of features and valid ids are returned.
   (...)
    382 
    383     """
--> 384     features = self.transform(mols, ignore_errors=ignore_errors, enforce_dtype=False, **kwargs)
    385     ids = np.arange(len(features))
    386     if ignore_errors:

File [~/miniconda3/envs/datamol/lib/python3.11/site-packages/sklearn/utils/_set_output.py:316](http://localhost:8889/lab/tree/~/miniconda3/envs/datamol/lib/python3.11/site-packages/sklearn/utils/_set_output.py#line=315), in _wrap_method_output.<locals>.wrapped(self, X, *args, **kwargs)
    314 @wraps(f)
    315 def wrapped(self, X, *args, **kwargs):
--> 316     data_to_wrap = f(self, X, *args, **kwargs)
    317     if isinstance(data_to_wrap, tuple):
    318         # only wrap the first output for cross decomposition
    319         return_tuple = (
    320             _wrap_data_with_container(method, data_to_wrap[0], X, self),
    321             *data_to_wrap[1:],
    322         )

File [~/miniconda3/envs/datamol/lib/python3.11/site-packages/molfeat/trans/pretrained/base.py:207](http://localhost:8889/lab/tree/~/miniconda3/envs/datamol/lib/python3.11/site-packages/molfeat/trans/pretrained/base.py#line=206), in PretrainedMolTransformer.transform(self, smiles, **kwargs)
    204 mols = [mols[i] for i in ind_to_compute]
    206 if len(mols) > 0:
--> 207     converted_mols = self._convert(mols, **kwargs)
    208     out = self._embed(converted_mols, **kwargs)
    210     if not isinstance(out, list):

File [~/miniconda3/envs/datamol/lib/python3.11/site-packages/molfeat/trans/pretrained/hf_transformers.py:367](http://localhost:8889/lab/tree/~/miniconda3/envs/datamol/lib/python3.11/site-packages/molfeat/trans/pretrained/hf_transformers.py#line=366), in PretrainedHFTransformer._convert(self, inputs, **kwargs)
    358 def _convert(self, inputs: list, **kwargs):
    359     """Convert the list of molecules to the right format for embedding
    360 
    361     Args:
   (...)
    365         processed: pre-processed input list
    366     """
--> 367     self._preload()
    369     if isinstance(inputs, (str, dm.Mol)):
    370         inputs = [inputs]

File [~/miniconda3/envs/datamol/lib/python3.11/site-packages/molfeat/trans/pretrained/hf_transformers.py:326](http://localhost:8889/lab/tree/~/miniconda3/envs/datamol/lib/python3.11/site-packages/molfeat/trans/pretrained/hf_transformers.py#line=325), in PretrainedHFTransformer._preload(self)
    324 def _preload(self):
    325     """Perform preloading of the model from the store"""
--> 326     super()._preload()
    327     self.featurizer.model.to(self.device)
    328     self.featurizer.max_length = self.max_length

File [~/miniconda3/envs/datamol/lib/python3.11/site-packages/molfeat/trans/pretrained/base.py:90](http://localhost:8889/lab/tree/~/miniconda3/envs/datamol/lib/python3.11/site-packages/molfeat/trans/pretrained/base.py#line=89), in PretrainedMolTransformer._preload(self)
     88 """Preload the pretrained model for later queries"""
     89 if self.featurizer is not None and isinstance(self.featurizer, PretrainedModel):
---> 90     self.featurizer = self.featurizer.load()
     91     self.preload = True

File [~/miniconda3/envs/datamol/lib/python3.11/site-packages/molfeat/trans/pretrained/hf_transformers.py:209](http://localhost:8889/lab/tree/~/miniconda3/envs/datamol/lib/python3.11/site-packages/molfeat/trans/pretrained/hf_transformers.py#line=208), in HFModel.load(self)
    207 if self._model is not None:
    208     return self._model
--> 209 download_output_dir = self._artifact_load(
    210     name=self.name, download_path=self.cache_path, store=self.store
    211 )
    212 model_path = dm.fs.join(download_output_dir, self.store.MODEL_PATH_NAME)
    213 self._model = HFExperiment.load(model_path)

File [~/miniconda3/envs/datamol/lib/python3.11/site-packages/molfeat/store/loader.py:81](http://localhost:8889/lab/tree/~/miniconda3/envs/datamol/lib/python3.11/site-packages/molfeat/store/loader.py#line=80), in PretrainedStoreModel._artifact_load(cls, name, download_path, **kwargs)
     79 if not dm.fs.exists(download_path):
     80     cls._load_or_raise.cache_clear()
---> 81 return cls._load_or_raise(name, download_path, **kwargs)

File [~/miniconda3/envs/datamol/lib/python3.11/site-packages/molfeat/store/loader.py:103](http://localhost:8889/lab/tree/~/miniconda3/envs/datamol/lib/python3.11/site-packages/molfeat/store/loader.py#line=102), in PretrainedStoreModel._load_or_raise(cls, name, download_path, store, **kwargs)
    101 except Exception:
    102     mess = f"Can't retrieve model {name} from the store !"
--> 103     raise ModelStoreError(mess)
    104 return artifact_dir

ModelStoreError: Can't retrieve model ChemGPT-1.2B from the store !

Environment

Current environment ``` molfeat 0.10.1 pytorch 2.4.0 rdkit 2024.03.5 macOS Ventura 13.6.7 scikit-learn 1.5.2 Used conda to install molfeat ```

Additional context

I'm using my local laptop + Jupyter Lab.

maclandrol commented 1 month ago

Hi @hisplan, sorry for late response.

It's likely that you have loss internet while downloading a previous version. Clearing the molfeat cache (/Users/chunj/Library/Caches/molfeat/ in your case) should help. Try removing the ChemGPT folders.

Otherwise, can you check if the instructions in https://github.com/datamol-io/molfeat/issues/29 or https://github.com/datamol-io/molfeat/issues/84 have solved your issue ?

hisplan commented 1 month ago

I cleared the molfeat cache, tried again as follows, but still didn't work. The error message was pretty much the same as before.

transformer = PretrainedHFTransformer(kind='ChemGPT-1.2B', notation='selfies', dtype=float)
features = transformer(smiles)

Looking at the error message more carefully, it looks like the SHA256 checksum didn't match. Here's the part of error message I noticed:

ModelStoreError: The destination artifact at [/Users/chunj/Library/Caches/molfeat/ChemGPT-1.2B/model.save](http://localhost:8889/Users/chunj/Library/Caches/molfeat/ChemGPT-1.2B/model.save)
has a different sha256sum (4d8819f7c8c91ba94ba446d32f29342360d62971a9fa37c8cab2e31f9c3fc4c5)
than the Remote artifact sha256sum (e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855).
The destination artifact has been removed !

Any idea?