Urinx / alphafold_pytorch

An implementation of the DeepMind's AlphaFold based on PyTorch for research
Apache License 2.0
392 stars 92 forks source link

uniclust_30_2018_08_hhsuite.tar.gz size #28

Closed nicolasfredesfranco closed 3 years ago

nicolasfredesfranco commented 3 years ago

I have a problem with the uniclust_30_2018_08_hhsuite.tar.gz file required to preprocessing a fasta file and produce the feature. I downloaded it two times, and I got different sizes of the tar files and the directories generated by the tars on each case. Maybe one of these tar files was corrupted. The issue is that the shorter of these tars produce the most extensive directory and vice versa. I can't figure what is the wrong file out. One of the tar has a size of 25 GB and produces a folder of 87 GB, and the other tar has a length of 24 GB and produces a directory of 165 GB. What is the right size of the tar file? Inside the uniclust30_2018_08 folder, What is supposed the size of uniclust30_2018_08_a3m_db and uniclust30_2018_08_hhm_db? Because I download again the tar file monitoring the process, and I got the tar that produces the shorter folder, and these two files have a size practically of a cero, then it is a little weird. I've been using the webpage provide in the setup HHBlits from the HH-suite3 section of the readme (wget http://wwwuser.gwdg.de/~compbiol/uniclust/2018_08/uniclust30_2018_08_hhsuite.tar.gz). Is there another trustable webpage to download the same dataset? I couldn't find one. Thank you for your help.

Geraldene commented 3 years ago

@nicolasfredesfranco I used the same link when downloaded the tar file is 24.8GB and when extracted its around 86.3GB