Rostlab / SeqVec

Modelling the Language of Life - Deep Learning Protein Sequences
http://embed.protein.properties
MIT License
116 stars 13 forks source link

model resources cannot be downloaded #27

Open Jmpax404 opened 1 month ago

Jmpax404 commented 1 month ago

It seems that model files on rostlab.org cannot be accessed recently by all links below:

https://rostlab.org/~deepppi/seqvec.zip https://rostlab.org/~deepppi/seqvec_checkpoint.tar.gz http://rostlab.org/~deepppi/embedding_repo/embedding_models/seqvec/options.json http://rostlab.org/~deepppi/embedding_repo/embedding_models/seqvec/weights.hdf5

Thus, both automatic and manual downloads are ineffective. Could you fix it or provide another available download links? : )

mheinzinger commented 2 weeks ago

sorry for the delayed response; will try to look into this but the source of the problem is the ongoing problem with our internal FTP server. I am trying to recover the files but can you in the meantime maybe try this approach for installing SeqVec? : https://github.com/Rostlab/SeqVec/issues/26#issuecomment-2267991933

Jmpax404 commented 1 week ago

Thank you for your concern about this issue and thank you for maintaining this project. I have no problem on installing SeqVec python package, but when using SeqVec to get embeddings, it will download pre-trained SeqVec model files on rostlab.org server (the code is below).

Luckily, I found these files from my co-worker. But I still suggest storing files as backups on other free and stable cloud drives, such as Google Drive.

def get_elmo_model(model_dir: Path, cpu: bool) -> ElmoEmbedder:
    weights_path = model_dir / "weights.hdf5"
    options_path = model_dir / "options.json"

    # if no pre-trained model is available, yet --> download it
    if not (weights_path.exists() and options_path.exists()):
        logger.info(
            "No existing model found. Start downloading pre-trained SeqVec (~360MB)..."
        )

        Path.mkdir(model_dir, exist_ok=True)
        repo_link = "http://rostlab.org/~deepppi/embedding_repo/embedding_models/seqvec"
        options_link = repo_link + "/options.json"
        weights_link = repo_link + "/weights.hdf5"
        urllib.request.urlretrieve(options_link, str(options_path))
        urllib.request.urlretrieve(weights_link, str(weights_path))
mheinzinger commented 1 week ago

Oh well, I am so sorry, I misread your issue. We keep experiencing issues with our FTP server and the problem is so bad that I can not even access those weights at the moment by myself ... Lesson learned: always put those things on Zenodo or sth alike. In case you feel like sharing the weights, I can upload them somewhere else. Otherwise, I do so once our server is back up again.

Jmpax404 commented 1 week ago

I uploaded those weights onto a temporary file sharing platform. They will be removed after 7 days. The url is here, https://filebin.net/l553d2tiek3r8wzq

You can check these files MD5 values.

file name MD5
SeqVec.zip f9664ab720a7d7cd5ea48a7d8b0574e2
options.json 05637bed5b38e68ee17e107648a5f597
weights.hdf5 5f9d3f5fcac5e6bfadc88aebf147ac02