Closed CryoSky closed 1 year ago
Hi, thank you for your interest in our work. Generally, according to our experience, the downloading process for downstream datasets should be fast (less than 1 hour) if you are downloading dataset to remote cluster. If you are downloading to personal laptop, it may be useful to check the wifi speed. And could you specify which downstream dataset you are trying to download, so that we can replicate the process to better help?
There is also a more direct way to control the dowloading process with wget
command in shell. Just manually download the dataset with its url to the directory dataset.path
and try running the script again.
The url of dataset can either be found in our repo ./ProtST-dev/protst/dataset.py
or in torchdrug's repo https://github.com/DeepGraphLearning/torchdrug/tree/master/torchdrug/datasets
.
And the dataset.path
has been specified in downstream configs (e.g., "./ProtST-dev/config/downstream_task/PretrainESM/annotation_tune.yaml").
Hello, thank you for the reply. I don't mean downloading or unzipping the database file takes too long. Instead, I think there is a step to construct the pdb files into a pkl file that takes a very long time. I can do it very quickly on my server but my personal computer with NVIDIA 2080Ti is estimated to take more than 10 days. If you happen to have a similar issue please let me know.
Hi, it sounds a good news that it runs quickly on servers. Considering this, the issue may be specific to the local environment.
I'm sorry that I don't have the same local environment as NVIDIA 2080Ti and it may be hard for me assist further. But please do let me know if there is anything else I can help with.
Hi @KatarinaYuan, thank you for the reply. I tried to test Protst on the server and this issue is no longer a roadblock. Thus, I will close this ticket and thank you for the reply.
Hello, it's very nice to see this work. I'm trying to download the script and use the function annotation. However, after I ran run_downstream.py, it starts with the dataset building but my progress for Constructing proteins from pdbs was stuck at around 92%. The progress estimate is that it takes more than 10 days to build. Could you share me how to quickly construct the proteins? Thank you very much!