facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins
MIT License
3.26k stars 643 forks source link

re-PR for pretrained ESM weights issue. #371

Closed YaoYinYing closed 1 year ago

YaoYinYing commented 2 years ago

This is a re-PR of #329 by kind hints of @tomsercu .

  1. Scripts for downloading/updating all pretrained model/regression(if available) listed in README.md. pt files will be located at a subdir called checkpoints.
  2. add --model_pthoption to esmfold inference script to read pretrained esm weights. This might be useful for all users in one HPC cluster or workstation.
tomsercu commented 1 year ago

I tried to run the script but fails for me on zsh, i think it goes wrong when you're trying to do system() inside awk. Maybe the script is salvagable if you just print all the commands to stdout and then pipe it into bash again. Here's a possible patch, haven't tested too thoroughly, should be able to do git apply patch, test and commit to update this PR: https://gist.github.com/tomsercu/46321dbca8ded930900cfbbf21483f56

YaoYinYing commented 1 year ago

Well, looks like it works on both my macbook(awk version 20200816) and Ubuntu workstation(GNU Awk 5.0.1), so I have no idea why it run into error. What is the error message looks like? Will this occur to different version of awk?

Your patch looks great if one runs command like bash scripts/download_weights.sh /path/to/weights/esm | bash, which should generate a series of download commands and run all of then in another bash.

The question is - I simply guess urls of *-regression.pt from those of weights file listed in README.md, yet some of the infered urls do not exist at all, resulting in aria2c raising an error message of download failure. As what I have planed, this can be ingored in system() of awk scripts. In that case, set -e will stop the entire script instead.

bash scripts/download_weights.sh ../../db/weights/esm/ | bash
/path/to/db/weights/esm/checkpoints /path/to/repo/esm
Download complete: esm2_t48_15B_UR50D.pt
Download complete: esm2_t48_15B_UR50D-contact-regression.pt
Download complete: esm2_t36_3B_UR50D.pt
Download complete: esm2_t36_3B_UR50D-contact-regression.pt
Download complete: esm2_t33_650M_UR50D.pt
Download complete: esm2_t33_650M_UR50D-contact-regression.pt
Download complete: esm2_t30_150M_UR50D.pt
Download complete: esm2_t30_150M_UR50D-contact-regression.pt
Download complete: esm2_t12_35M_UR50D.pt
Download complete: esm2_t12_35M_UR50D-contact-regression.pt
Download complete: esm2_t6_8M_UR50D.pt
Download complete: esm2_t6_8M_UR50D-contact-regression.pt
Download complete: esmfold_3B_v1.pt
Download not complete: esmfold_3B_v1-contact-regression.pt

11/22 17:09:32 [NOTICE] Downloading 1 item(s)
p11-kit: softhsm: module failed to initialize, skipping: Internal error
[#94d511 0B/0B CN:1 DL:0B]
11/22 17:09:35 [ERROR] CUID#7 - Download aborted. URI=https://dl.fbaipublicfiles.com/fair-esm/regression/esmfold_3B_v1-contact-regression.pt
Exception: [AbstractCommand.cc:351] errorCode=22 URI=https://dl.fbaipublicfiles.com/fair-esm/regression/esmfold_3B_v1-contact-regression.pt
  -> [HttpSkipResponseCommand.cc:240] errorCode=22 The response status is not successful. status=403

11/22 17:09:35 [NOTICE] Download GID#94d511c5c8bff1da not complete:

Download Results:
gid   |stat|avg speed  |path/URI
======+====+===========+=======================================================
94d511|ERR |       0B/s|https://dl.fbaipublicfiles.com/fair-esm/regression/esmfold_3B_v1-contact-regression.pt

Status Legend:
(ERR):error occurred.

aria2 will resume download if the transfer is restarted.
If there are any errors, then see the log file. See '-l' option in help/man page for details.

For your patch I suggest to skip this error by the following:

# download regression
print "if [[ ! -f $(basename "url_regression") || -f $(basename "url_regression").aria2 ]];then echo Download not complete: $(basename "url_regression");aria2c -x 10 "url_regression" || echo Nevermind; else echo Download complete: $(basename "url_regression");fi"
tomsercu commented 1 year ago

Yes the regression weights for contact prediction are there for some models, you can skip those silently. Let me know once the script is updated on this PR and I'll merge!

YaoYinYing commented 1 year ago

@tomsercu Thanks! this script is now updated to skip download failures and tested on both zsh and bash.

tomsercu commented 1 year ago

Thanks for your contribution!