download the model weights to local

lzhangUT commented 2 years ago

Hi, @rmrao, I am interested in using the esm-iv models (1-5), it takes a long time to download the model/model weights every time when I run one sequence. I wonder if I can download the model weights (or model) into my local workspace, as in Azure Databricks? so It would read the model very quickly when I need to run a lot of sequences. if so, what would be the code to do that? THank you.

rmrao commented 2 years ago

By default, the checkpoints are downloaded to and cached in the directory defined by f"{torch.hub.get_dir()}/checkpoints". If you simply modify the torch hub cache directory (see the documentation here) before the model is downloaded, it should download it to the new cache directory and future runs should check this cache directory.

If it's downloading the model every time, then probably the default cache points to some ephemeral directory only associated with a particular Azure instance. If you change this to point to a persistent storage directory, that should solve the issue.

If for any reason that doesn't work, the URLs for all the models are of the form f"https://dl.fbaipublicfiles.com/fair-esm/models/{model_name}.pt". You can manually download the model and load it with esm.pretrained.load_model_and_alphabet_local(<path/to/file>).

lzhangUT commented 2 years ago

Hi Roshan, Thanks for th suggestion, I changed the cache directory before downloading, it seems like it is a little bit quicker. when loading the model, I always get this warning: /databricks/python/lib/python3.8/site-packages/esm/pretrained.py:134: UserWarning: Regression weights not found, predicting contacts will not produce correct results. warnings.warn("Regression weights not found, predicting contacts will not produce correct results.")

Should I be concerned about this warning? Thanks.

On Thu, Mar 10, 2022 at 12:09 AM Roshan Rao @.***> wrote:

By default, the checkpoints are downloaded to and cached in the directory defined by f"{torch.hub.get_dir()}/checkpoints". If you simply modify the torch hub cache directory (see the documentation here https://pytorch.org/docs/stable/hub.html#where-are-my-downloaded-models-saved) before the model is downloaded, it should download it to the new cache directory and future runs should check this cache directory.

If it's downloading the model every time, then probably the default cache points to some ephemeral directory only associated with a particular Azure instance. If you change this to point to a persistent storage directory, that should solve the issue.

If for any reason that doesn't work, the URLs for all the models are of the form f"https://dl.fbaipublicfiles.com/fair-esm/models/{model_name}.pt". You can manually download the model and load it with esm.pretrained.load_model_and_alphabet_local(<path/to/file>).

— Reply to this email directly, view it on GitHub https://github.com/facebookresearch/esm/issues/174#issuecomment-1063092319, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJNMLA5AUXFTGYQQV5HIX3TU7DELXANCNFSM5QFX6E2Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

rmrao commented 2 years ago

You can ignore the warning. (No regression weights are provided for ESM-1v because its not designed for contact prediction. We should probably just throw an error if you try to do contact prediction with ESM-1v models.)

As for the speed - after the first time it downloads the model, it should never download it again. If you're still seeing slowdowns you could change the loading code to the local loading version (esm.pretrained.load_model_and_alphabet_local(<path/to/file>)) to ensure that it's not downloading anything.

It's possible when using a cloud setup that the transfer of the model weights from storage actually takes quite a while (the weights for each individual model are ~10GB). I'm not familiar enough with Azure to suggest solutions if this is the issue.

lzhangUT commented 2 years ago

Thank you, @rmrao everything works now. I do have another question: when I use the esm-iv models (1-5) predictions for validation/test, in the paper, it was mentioned that the average value of the five model predictions or/and the ensemble of the five model was compared with the experimental data. It is easy to think about the average of the five models, what is the ensemble value? Is there another model to count for the ensemble or different calculation from the five models to count for? Thanks

lzhangUT commented 2 years ago

another question is about the bootstrap. from the paper, it says 'To compute bootstraps for the pointplots (figure 3), we randomly resample each deep mutational scan (with replacement) and compute the Spearman ρ between the experimental data and model predictions.

Figure 3 also says 'Points are mean ± std of 20 bootstrapped samples.'. a little confused here, here 20 is the sample size from each dataset (that would be too small, wouldn't it), or 20 is the times of bootstrapping, if so, what is the sample size for each bootstrap (the number of observations for each dataset itself? THanks

rmrao commented 2 years ago

As far as I'm aware the ensemble prediction is the average of the five models. If there's a part of the paper that implies something different, let me know and I can take a look.

I actually didn't run the bootstrapping experiment so I'm not sure of the details right now. @robert-verkuil, do you happen to know? If not I'll try to figure it out but it may take a week or so.

tomsercu commented 2 years ago

Hi @lzhangUT , thanks for your interest in our models! The follow-up questions would better belong int he discussion forums, but let me quickly answer:

Average of models: first compute rhos, then average. Vs ensemble: average predictions, ie different ranking before computing the spearman rho.
Bootstrap: for a given protein, the DMS size is N (typically order of thousands of mutations). The bootstrap sample size is N, sampled with replacement. "mean ± std of 20 bootstrapped samples" means we repeat this bootstrap sample 20 times, then compute rho for each of the 20 bootstrap samples, compute mean and std.

Hope that helps - If anything is unclear you can open a gh discussion and we can follow up there!

facebookresearch / esm

download the model weights to local #174