baker-laboratory / RoseTTAFold-All-Atom

Other
616 stars 108 forks source link

Follow the steps to install RFAA, but an error occurs when trying to test 7u7w_protein #33

Open knight-qs opened 6 months ago

knight-qs commented 6 months ago

Thank you for your work, but I had some trouble trying to install and use it.

At the end is my error message. I noticed that these two issues are very similar to my error message:

https://github.com/baker-laboratory/RoseTTAFold-All-Atom/issues/17 https://github.com/baker-laboratory/RoseTTAFold-All-Atom/issues/24

I tried to change the UniRef30 version name in make_msa.sh, or the HHLIB path in line 33, or rename "UniRef30_2020_06" in the folder to "uniclust" in accordance with the answers inside. But they don't work, and the same error message will appear. I don't know what went wrong.

Error:

(RFAA) [admin@cluster RoseTTAFold-All-Atom]$ python -m rf2aa.run_inference --config-name protein /home/admin/mambaforge/envs/RFAA/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'protein': Defaults list is missing _self_. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information warnings.warn(msg, UserWarning) Using the cif atom ordering for TRP. ./make_msa.sh examples/protein/7u7w_A.fasta 7u7w_protein/A 4 64 pdb100_2021Mar03/pdb100_2021Mar03 Predicting: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.14sequences/s] Running HHblits against UniRef30 with E-value cutoff 1e-10

grep: 7u7wprotein/A/hhblits/t000.1e-10.id90cov75.a3m: No such file or directory grep: 7u7wprotein/A/hhblits/t000.1e-10.id90cov50.a3m: No such file or directory Running HHblits against UniRef30 with E-value cutoff 1e-6

grep: 7u7wprotein/A/hhblits/t000.1e-6.id90cov75.a3m: No such file or directory grep: 7u7wprotein/A/hhblits/t000.1e-6.id90cov50.a3m: No such file or directory Running HHblits against UniRef30 with E-value cutoff 1e-3

grep: 7u7wprotein/A/hhblits/t000.1e-3.id90cov75.a3m: No such file or directory grep: 7u7wprotein/A/hhblits/t000.1e-3.id90cov50.a3m: No such file or directory Running HHblits against BFD with E-value cutoff 1e-3

grep: 7u7wprotein/A/hhblits/t000.1e-3.bfd.id90cov75.a3m: No such file or directory grep: 7u7wprotein/A/hhblits/t000.1e-3.bfd.id90cov50.a3m: No such file or directory cp: cannot stat ‘7u7wprotein/A/hhblits/t000.1e-3.bfd.id90cov50.a3m’: No such file or directory Running PSIPRED Running hhsearch cat: 7u7wprotein/A/t000.ss2: No such file or directory cat: 7u7wprotein/A/t000.msa0.a3m: No such file or directory

Error executing job with overrides: [] Traceback (most recent call last): File "/data_20T/home_data/admin/RoseTTAFold-All-Atom/rf2aa/run_inference.py", line 206, in main runner.infer() File "/data_20T/home_data/admin/RoseTTAFold-All-Atom/rf2aa/run_inference.py", line 153, in infer self.parse_inference_config() File "/data_20T/home_data/admin/RoseTTAFold-All-Atom/rf2aa/run_inference.py", line 46, in parse_inference_config protein_input = generate_msa_and_load_protein( File "/data_20T/home_data/admin/RoseTTAFold-All-Atom/rf2aa/data/protein.py", line 93, in generate_msa_and_load_protein return load_protein(str(msa_file), str(hhr_file), str(atab_file), model_runner) File "/data_20T/home_data/admin/RoseTTAFold-All-Atom/rf2aa/data/protein.py", line 56, in load_protein msa, ins, taxIDs = parse_a3m(msa_file) File "/data_20T/home_data/admin/RoseTTAFold-All-Atom/rf2aa/data/parsers.py", line 415, in parse_a3m fstream = open(filename, 'r') FileNotFoundError: [Errno 2] No such file or directory: '7u7wprotein/A/t000.msa0.a3m'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

xie-yun-ai commented 6 months ago

Hello,you need to check the database path and name in make_msa.sh.,Because the path of the script may be different from your actual path,/data_20T/home_data/admin/RoseTTAFold-All-Atom/uniclust/UniRef30_2021_06_cs219.ffdata'.

knight-qs commented 6 months ago

Thank you for pointing out the problem. I followed the method stated at the beginning of this issue https://github.com/baker-laboratory/RoseTTAFold-All-Atom/issues/32 (by changing folder name UniRef30 into uniclust and changing UniRef30_2021_06 into UniRef30_2020_06 in make_msa.sh), In addition, I added execute permissions to the input_prep/make_ss.sh file (chmod +x input_prep/make_ss.sh) as discussed in the previous issue. https://github.com/baker-laboratory/RoseTTAFold-All-Atom/issues/32.

I don't know if the problem has been solved yet. It's been about ten minutes, but hasn't stopped, and I don't know if I'll get the same error as https://github.com/baker-laboratory/RoseTTAFold-All-Atom/issues/32 image

hhbilts has low cpu utilization, I don't know if this is normal image

xie-yun-ai commented 6 months ago

Hello, I'm a beginner. I don't know much about this CPU problem. I strictly follow the installation tutorial. Sorry。 I can only guess and ask whether your computer has a graphics card, and whether it has a driver installed. I don't know much about other things

knight-qs commented 6 months ago

Hello, I'm a beginner. I don't know much about this CPU problem. I strictly follow the installation tutorial. Sorry。 I can only guess and ask whether your computer has a graphics card, and whether it has a driver installed. I don't know much about other things

Thank you for your help. I am also a beginner and trying to figure out how to use RFAA. There is a 4090 graphics card on our lab server, and the driver should be fine, as both Gromacs and Alphafold2 work normally.

image

knight-qs commented 6 months ago

hhblits cpu utilization is now higher image

xie-yun-ai commented 6 months ago

hhblits cpu utilization is now higher image

Hello, your configuration is better than mine. I don't think you need to care

knight-qs commented 6 months ago

But unfortunately, it finally reported the wrong. It may be necessary to wait for more people to test and use, and some potential problems in the installation process can be slowly solved.

(RFAA) [admin@cluster RoseTTAFold-All-Atom]$ python -m rf2aa.run_inference --config-name protein /home/admin/mambaforge/envs/RFAA/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'protein': Defaults list is missing _self_. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information warnings.warn(msg, UserWarning) Using the cif atom ordering for TRP. ./make_msa.sh examples/protein/7u7w_A.fasta 7u7w_protein/A 4 64 pdb100_2021Mar03/pdb100_2021Mar03 Predicting: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.03sequences/s] Running HHblits against UniRef30 with E-value cutoff 1e-10

Running PSIPRED buffer overflow detected : psipred terminated ======= Backtrace: ========= /lib64/libc.so.6(fortify_fail+0x37)[0x7f90a7af47a7] /lib64/libc.so.6(+0x116922)[0x7f90a7af2922] /lib64/libc.so.6(__fgets_chk+0x129)[0x7f90a7af2c49] psipred[0x401390] psipred[0x40158a] /lib64/libc.so.6(libc_start_main+0xf5)[0x7f90a79fe555] psipred[0x400a89] ======= Memory map: ======== 00400000-00402000 r-xp 00000000 fd:00 36471512 /home/admin/mambaforge/envs/RFAA/bin/psipred 00601000-00602000 r--p 00001000 fd:00 36471512 /home/admin/mambaforge/envs/RFAA/bin/psipred 00602000-00603000 rw-p 00002000 fd:00 36471512 /home/admin/mambaforge/envs/RFAA/bin/psipred 00603000-006d3000 rw-p 00000000 00:00 0 00c2f000-00c50000 rw-p 00000000 00:00 0 [heap] 7f90a79dc000-7f90a7ba0000 r-xp 00000000 fd:00 81325 /usr/lib64/libc-2.17.so 7f90a7ba0000-7f90a7d9f000 ---p 001c4000 fd:00 81325 /usr/lib64/libc-2.17.so 7f90a7d9f000-7f90a7da3000 r--p 001c3000 fd:00 81325 /usr/lib64/libc-2.17.so 7f90a7da3000-7f90a7da5000 rw-p 001c7000 fd:00 81325 /usr/lib64/libc-2.17.so 7f90a7da5000-7f90a7daa000 rw-p 00000000 00:00 0 7f90a7daa000-7f90a7eab000 r-xp 00000000 fd:00 81335 /usr/lib64/libm-2.17.so 7f90a7eab000-7f90a80aa000 ---p 00101000 fd:00 81335 /usr/lib64/libm-2.17.so 7f90a80aa000-7f90a80ab000 r--p 00100000 fd:00 81335 /usr/lib64/libm-2.17.so 7f90a80ab000-7f90a80ac000 rw-p 00101000 fd:00 81335 /usr/lib64/libm-2.17.so 7f90a80ac000-7f90a80ce000 r-xp 00000000 fd:00 81317 /usr/lib64/ld-2.17.so 7f90a82a4000-7f90a82a7000 rw-p 00000000 00:00 0 7f90a82af000-7f90a82b3000 r--p 00000000 fd:00 3890837 /home/admin/mambaforge/envs/RFAA/lib/libgcc_s.so.1 7f90a82b3000-7f90a82c5000 r-xp 00004000 fd:00 3890837 /home/admin/mambaforge/envs/RFAA/lib/libgcc_s.so.1 7f90a82c5000-7f90a82c8000 r--p 00016000 fd:00 3890837 /home/admin/mambaforge/envs/RFAA/lib/libgcc_s.so.1 7f90a82c8000-7f90a82c9000 r--p 00019000 fd:00 3890837 /home/admin/mambaforge/envs/RFAA/lib/libgcc_s.so.1 7f90a82c9000-7f90a82ca000 rw-p 0001a000 fd:00 3890837 /home/admin/mambaforge/envs/RFAA/lib/libgcc_s.so.1 7f90a82ca000-7f90a82cd000 rw-p 00000000 00:00 0 7f90a82cd000-7f90a82ce000 r--p 00021000 fd:00 81317 /usr/lib64/ld-2.17.so 7f90a82ce000-7f90a82cf000 rw-p 00022000 fd:00 81317 /usr/lib64/ld-2.17.so 7f90a82cf000-7f90a82d0000 rw-p 00000000 00:00 0 7ffe0f198000-7ffe0f1ba000 rw-p 00000000 00:00 0 [stack] 7ffe0f1fa000-7ffe0f1fc000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] Running hhsearch

Error executing job with overrides: [] Traceback (most recent call last): File "/data_20T/home_data/admin/RoseTTAFold-All-Atom/rf2aa/run_inference.py", line 206, in main runner.infer() File "/data_20T/home_data/admin/RoseTTAFold-All-Atom/rf2aa/run_inference.py", line 153, in infer self.parse_inference_config() File "/data_20T/home_data/admin/RoseTTAFold-All-Atom/rf2aa/run_inference.py", line 46, in parse_inference_config protein_input = generate_msa_and_load_protein( File "/data_20T/home_data/admin/RoseTTAFold-All-Atom/rf2aa/data/protein.py", line 93, in generate_msa_and_load_protein return load_protein(str(msa_file), str(hhr_file), str(atab_file), model_runner) File "/data_20T/home_data/admin/RoseTTAFold-All-Atom/rf2aa/data/protein.py", line 66, in load_protein xyz_t, t1d, maskt, = get_templates( File "/data_20T/home_data/admin/RoseTTAFold-All-Atom/rf2aa/data/protein.py", line 30, in get_templates ) = parse_templates_raw(ffdb, hhr_fn=hhr_fn, atab_fn=atab_fn) File "/data_20T/home_data/admin/RoseTTAFold-All-Atom/rf2aa/data/parsers.py", line 628, in parse_templates_raw for l in open(atab_fn, "r").readlines(): FileNotFoundError: [Errno 2] No such file or directory: '7u7wprotein/A/t000.atab'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace. (RFAA) [admin@cluster RoseTTAFold-All-Atom]$

xie-yun-ai commented 6 months ago

Ha ha ha. My mistakes are always one step ahead of you. I'm also solving them now. I'll show you the answer link https://github.com/baker-laboratory/RoseTTAFold-All-Atom/issues/5#issuecomment-1990991606

knight-qs commented 6 months ago

Thank you very much. I will try to follow your solution to solve this problem.

xie-yun-ai commented 6 months ago

It's so sad. I made mistakes several times. I really hope you can solve it. I hope you can show me the specific picture of how you solved the problem after you solved it, please

knight-qs commented 6 months ago

I will continue to try, but now I think the better solution may be to wait for more people to test and use it. A new tool or solution is always imperfect when it is first released, and there may be various problems that need to be solved. I'm a beginner, and may not be able to solve all the problems. It's really frustrating.

knight-qs commented 6 months ago

It seems predictable that 7u7w mono is now available, the solution is again to install blast locally and then modify the input_prep/make_ss.sh file. https://github.com/baker-laboratory/RoseTTAFold-All-Atom/issues/5#issuecomment-1990991606 #31 #34

wget https://ftp.ncbi.nlm.nih.gov/blast/executables/legacy.NOTSUPPORTED/2.2.26/blast-2.2.26-x64-linux.tar.gz
mkdir -p blast-2.2.26
tar -xf blast-2.2.26-x64-linux.tar.gz -C blast-2.2.26
cp -r blast-2.2.26/blast-2.2.26/ blast-2.2.26_bk
rm -r blast-2.2.26
mv blast-2.2.26_bk/ blast-2.2.26
vi input_prep/make_ss.sh

add export BLASTMAT=$PIPE_DIR/blast-2.2.26/data/ in input_prep/make_ss.sh image

Then try again and it can predict normally. The result appears directly in the RoseTTAFold-All-Atom directory. The speed of the prediction exceeded my expectations, it was completed in a few minutes, and it didn't seem to use the gpu, so I don't know if this is a problem. image

I wonder why the log file is also empty. image

The predicted 7u7w monomer and the actual crystal structure rmsd are 1.225. image

xie-yun-ai commented 6 months ago

2775980a53f322ccd7bcfabf4234a0ba Hello, may I ask if the directory of the results you run is like this? I am not familiar with the parsing of this result, so far I have only looked at 7u7w_protein.pdb. Excuse me, how did you get the results? Is there a learning link that you can refer to

252c343661200556dbe0f899f1eaf657

knight-qs commented 6 months ago

Yes, our result directory structure seems to be the same. I have actually been following your solution to solve the problem, thank you very much.

image

The pdb file I get is actually the same as yours, but with an extra step. Using fetch 7u7w in pymol can automatically download the crystal structure in the pdb database, and then compare the predicted crystal structure with the downloaded crystal structure to obtain the rmsd value. The 7u7w crystal structure in the pdb database contains nucleic acid. You can easily look this up on the Internet.

image

xie-yun-ai commented 6 months ago

Yes, our result directory structure seems to be the same. I have actually been following your solution to solve the problem, thank you very much.

image

The pdb file I get is actually the same as yours, but with an extra step. Using fetch 7u7w in pymol can automatically download the crystal structure in the pdb database, and then compare the predicted crystal structure with the downloaded crystal structure to obtain the rmsd value. The 7u7w crystal structure in the pdb database contains nucleic acid. You can easily look this up on the Internet.

image

Thank you very much

knight-qs commented 6 months ago

I test python -m rf2aa.run_inference --config-name nucleic_acid. It seems to work well.

image

image

knight-qs commented 6 months ago

I test python -m rf2aa.run_inference --config-name protein_sm. It seems to work well.

image

image

knight-qs commented 6 months ago

I test python -m rf2aa.run_inference --config-name protein_na_sm. It seems to work well.

image

image

knight-qs commented 6 months ago

I test python -m rf2aa.run_inference --config-name covalent. It seems to work well, but slower than previous tests.

image

image

knight-qs commented 6 months ago

7u7w monomer predicted by Rostttafold-All-Atom, rmsd with 7u7w downloaded from pdb database is 1.225 image

7u7w monomer predicted by Alphaflod2.3,rmsd with 7u7w downloaded from pdb database is 0.696 image

rmsd between 7u7w monomer predicted by Rostttafold-All-Atom and 7u7w monomer predicted by Alphaflod2.3 is 1.308 image

It seems that Rostttafold-All-Atom is much faster than Alphaflod2.3. Except for the covalent test, all of the tests were completed within 10 minutes. I didn't even notice an increase in GPU utilization.

xie-yun-ai commented 6 months ago

Hello, I encountered an error while running. Have you encountered this error? I'm not sure if I can solve it eaaa2ae8572a45be3266dcbf919d9d49

knight-qs commented 6 months ago

I am so sorry that I passed all the tests at one time, so I did not encounter this problem. Our lab will buy another server later. If I encounter this problem on the new server, I will contact and discuss with you.

xie-yun-ai commented 6 months ago

I am so sorry that I passed all the tests at one time, so I did not encounter this problem. Our lab will buy another server later. If I encounter this problem on the new server, I will contact and discuss with you.

I am really happy to hear this news for you. I am currently recreating the environment and I am not very familiar with it, so I cannot solve it. I really envy the configuration of your laboratory。 Please forgive me for using translation software

knight-qs commented 6 months ago

老兄不用在意,我也用翻译软件,和你使用英语交流只是为了留痕,方便更多人吸收其中信息。我所在的实验室,目前仍以实验为主,只有我一人全职负责分析工作,自学各种知识也很辛苦。再购置一台服务器不是因为实验室分析力量强,是因为第一台服务器CPU的单核性能不够强,导致进行分子动力学模拟时无法发挥4090的全部性能。并且我们是找淘宝上的店铺组装服务器,所以花费实际上不会太贵。如果真找不到好的解决方案,我觉得你可以暂时放下,做一做别的工作,学一学新的知识,这个工具刚开源,还没经过充分测试,肯定会有各种各样的问题,这是固有规律,不是安装者的错。事实上,我也很诧异为什么我的没有报错,而你的出了问题,我明明是跟随了你的解决方案。等再沉淀一段时间,开发者可能会优化安装,网上也会有更多经验教程,到时候再次尝试会容易很多。github绑定邮箱似乎可以与你通信,我不确定你是否能看到上面的消息,假如你有迫切的预测需求,我们可以邮件联系

Deng98 commented 6 months ago

But unfortunately, it finally reported the wrong. It may be necessary to wait for more people to test and use, and some potential problems in the installation process can be slowly solved.

(RFAA) [admin@cluster RoseTTAFold-All-Atom]$ python -m rf2aa.run_inference --config-name protein /home/admin/mambaforge/envs/RFAA/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'protein': Defaults list is missing _self_. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information warnings.warn(msg, UserWarning) Using the cif atom ordering for TRP. ./make_msa.sh examples/protein/7u7w_A.fasta 7u7w_protein/A 4 64 pdb100_2021Mar03/pdb100_2021Mar03 Predicting: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.03sequences/s] Running HHblits against UniRef30 with E-value cutoff 1e-10

  • 11:50:37.332 INFO: Input file = 7u7wprotein/A/hhblits/t000.1e-10.a3m
  • 11:50:37.332 INFO: Output file = 7u7wprotein/A/hhblits/t000.1e-10.id90cov75.a3m
  • 11:50:37.591 WARNING: Maximum number 100000 of sequences exceeded in file 7u7wprotein/A/hhblits/t000.1e-10.a3m
  • 11:51:04.060 INFO: Input file = 7u7wprotein/A/hhblits/t000.1e-10.a3m
  • 11:51:04.060 INFO: Output file = 7u7wprotein/A/hhblits/t000.1e-10.id90cov50.a3m
  • 11:51:04.283 WARNING: Maximum number 100000 of sequences exceeded in file 7u7wprotein/A/hhblits/t000.1e-10.a3m

Running PSIPRED buffer overflow detected : psipred terminated ======= Backtrace: ========= /lib64/libc.so.6(fortify_fail+0x37)[0x7f90a7af47a7] /lib64/libc.so.6(+0x116922)[0x7f90a7af2922] /lib64/libc.so.6(__fgets_chk+0x129)[0x7f90a7af2c49] psipred[0x401390] psipred[0x40158a] /lib64/libc.so.6(libc_start_main+0xf5)[0x7f90a79fe555] psipred[0x400a89] ======= Memory map: ======== 00400000-00402000 r-xp 00000000 fd:00 36471512 /home/admin/mambaforge/envs/RFAA/bin/psipred 00601000-00602000 r--p 00001000 fd:00 36471512 /home/admin/mambaforge/envs/RFAA/bin/psipred 00602000-00603000 rw-p 00002000 fd:00 36471512 /home/admin/mambaforge/envs/RFAA/bin/psipred 00603000-006d3000 rw-p 00000000 00:00 0 00c2f000-00c50000 rw-p 00000000 00:00 0 [heap] 7f90a79dc000-7f90a7ba0000 r-xp 00000000 fd:00 81325 /usr/lib64/libc-2.17.so 7f90a7ba0000-7f90a7d9f000 ---p 001c4000 fd:00 81325 /usr/lib64/libc-2.17.so 7f90a7d9f000-7f90a7da3000 r--p 001c3000 fd:00 81325 /usr/lib64/libc-2.17.so 7f90a7da3000-7f90a7da5000 rw-p 001c7000 fd:00 81325 /usr/lib64/libc-2.17.so 7f90a7da5000-7f90a7daa000 rw-p 00000000 00:00 0 7f90a7daa000-7f90a7eab000 r-xp 00000000 fd:00 81335 /usr/lib64/libm-2.17.so 7f90a7eab000-7f90a80aa000 ---p 00101000 fd:00 81335 /usr/lib64/libm-2.17.so 7f90a80aa000-7f90a80ab000 r--p 00100000 fd:00 81335 /usr/lib64/libm-2.17.so 7f90a80ab000-7f90a80ac000 rw-p 00101000 fd:00 81335 /usr/lib64/libm-2.17.so 7f90a80ac000-7f90a80ce000 r-xp 00000000 fd:00 81317 /usr/lib64/ld-2.17.so 7f90a82a4000-7f90a82a7000 rw-p 00000000 00:00 0 7f90a82af000-7f90a82b3000 r--p 00000000 fd:00 3890837 /home/admin/mambaforge/envs/RFAA/lib/libgcc_s.so.1 7f90a82b3000-7f90a82c5000 r-xp 00004000 fd:00 3890837 /home/admin/mambaforge/envs/RFAA/lib/libgcc_s.so.1 7f90a82c5000-7f90a82c8000 r--p 00016000 fd:00 3890837 /home/admin/mambaforge/envs/RFAA/lib/libgcc_s.so.1 7f90a82c8000-7f90a82c9000 r--p 00019000 fd:00 3890837 /home/admin/mambaforge/envs/RFAA/lib/libgcc_s.so.1 7f90a82c9000-7f90a82ca000 rw-p 0001a000 fd:00 3890837 /home/admin/mambaforge/envs/RFAA/lib/libgcc_s.so.1 7f90a82ca000-7f90a82cd000 rw-p 00000000 00:00 0 7f90a82cd000-7f90a82ce000 r--p 00021000 fd:00 81317 /usr/lib64/ld-2.17.so 7f90a82ce000-7f90a82cf000 rw-p 00022000 fd:00 81317 /usr/lib64/ld-2.17.so 7f90a82cf000-7f90a82d0000 rw-p 00000000 00:00 0 7ffe0f198000-7ffe0f1ba000 rw-p 00000000 00:00 0 [stack] 7ffe0f1fa000-7ffe0f1fc000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] Running hhsearch

  • 11:53:10.611 ERROR: In /opt/conda/conda-bld/hhsuite_1709621322429/work/src/hhalignment.cpp:223: Read:
  • 11:53:10.611 ERROR: sequence ss_pred contains no residues.

Error executing job with overrides: [] Traceback (most recent call last): File "/data_20T/home_data/admin/RoseTTAFold-All-Atom/rf2aa/run_inference.py", line 206, in main runner.infer() File "/data_20T/home_data/admin/RoseTTAFold-All-Atom/rf2aa/run_inference.py", line 153, in infer self.parse_inference_config() File "/data_20T/home_data/admin/RoseTTAFold-All-Atom/rf2aa/run_inference.py", line 46, in parse_inference_config protein_input = generate_msa_and_load_protein( File "/data_20T/home_data/admin/RoseTTAFold-All-Atom/rf2aa/data/protein.py", line 93, in generate_msa_and_load_protein return load_protein(str(msa_file), str(hhr_file), str(atab_file), model_runner) File "/data_20T/home_data/admin/RoseTTAFold-All-Atom/rf2aa/data/protein.py", line 66, in load_protein xyz_t, t1d, maskt, = get_templates( File "/data_20T/home_data/admin/RoseTTAFold-All-Atom/rf2aa/data/protein.py", line 30, in get_templates ) = parse_templates_raw(ffdb, hhr_fn=hhr_fn, atab_fn=atab_fn) File "/data_20T/home_data/admin/RoseTTAFold-All-Atom/rf2aa/data/parsers.py", line 628, in parse_templates_raw for l in open(atab_fn, "r").readlines(): FileNotFoundError: [Errno 2] No such file or directory: '7u7wprotein/A/t000.atab'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace. (RFAA) [admin@cluster RoseTTAFold-All-Atom]$

I encountered the same problem : ...... Running PSIPRED buffer overflow detected : psipred terminated ======= Backtrace: ========= /lib64/libc.so.6(fortify_fail+0x37)[0x2b1d8731d7a7] /lib64/libc.so.6(+0x116922)[0x2b1d8731b922] /lib64/libc.so.6(__fgets_chk+0x129)[0x2b1d8731bc49] psipred[0x401390] psipred[0x40158a] /lib64/libc.so.6(libc_start_main+0xf5)[0x2b1d87227555] psipred[0x400a89] ======= Memory map: ======== ......

Can you tell me how to solve this problem?

Thanks very much!

knight-qs commented 6 months ago

It seems predictable that 7u7w mono is now available, the solution is again to install blast locally and then modify the input_prep/make_ss.sh file. #5 (comment) #31 #34

wget https://ftp.ncbi.nlm.nih.gov/blast/executables/legacy.NOTSUPPORTED/2.2.26/blast-2.2.26-x64-linux.tar.gz
mkdir -p blast-2.2.26
tar -xf blast-2.2.26-x64-linux.tar.gz -C blast-2.2.26
cp -r blast-2.2.26/blast-2.2.26/ blast-2.2.26_bk
rm -r blast-2.2.26
mv blast-2.2.26_bk/ blast-2.2.26
vi input_prep/make_ss.sh

add export BLASTMAT=$PIPE_DIR/blast-2.2.26/data/ in input_prep/make_ss.sh image

Then try again and it can predict normally. The result appears directly in the RoseTTAFold-All-Atom directory. The speed of the prediction exceeded my expectations, it was completed in a few minutes, and it didn't seem to use the gpu, so I don't know if this is a problem. image

I wonder why the log file is also empty. image

The predicted 7u7w monomer and the actual crystal structure rmsd are 1.225. image

Here are my solutions and sources of information for your reference.

Deng98 commented 6 months ago

I solved the problem based on your comment. Now, all predictions are terminated normally. Thanks very much!

knight-qs commented 6 months ago

Happy to have been of help, best of luck!

noyoume commented 6 months ago

Hello! I'm so glad you wrote about troubleshooting errors. I have the same problem.

1) Chmod the make_ss.sh file. (O) 2) Rename the file UniRef30 to uniclust. (O) 3) Downloaded blast, executed the commands in order, and added export BLASTMAT=$PIPE_DIR/blast-2.2.26/data/ to make_ss.sh. (O) 4) Modified the database path in make_msa.sh. (O)

All done

Error -----------------------------------------

(RFAA) ymnoh@ubuntu:~/RoseTTAFold-All-Atom/RoseTTAFold-All-Atom$ python -m rf2aa.run_inference --config-name protein_sm /home/ymnoh/anaconda3/envs/RFAA/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'protein_sm': Defaults list is missing _self_. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information warnings.warn(msg, UserWarning) Using the cif atom ordering for TRP. ./make_msa.sh examples/protein/7qxr.fasta 7qxr/A 4 64 /data/resource/RFAA/pdb100_2021Mar03/pdb100_2021Mar03 Predicting: 100%|████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.80sequences/s] Running HHblits against UniRef30 with E-value cutoff 1e-10

grep: 7qxr/A/hhblits/t000.1e-10.id90cov75.a3m: No such file or directory grep: 7qxr/A/hhblits/t000.1e-10.id90cov50.a3m: No such file or directory Running HHblits against UniRef30 with E-value cutoff 1e-6

grep: 7qxr/A/hhblits/t000.1e-6.id90cov75.a3m: No such file or directory grep: 7qxr/A/hhblits/t000.1e-6.id90cov50.a3m: No such file or directory Running HHblits against UniRef30 with E-value cutoff 1e-3

grep: 7qxr/A/hhblits/t000.1e-3.id90cov75.a3m: No such file or directory grep: 7qxr/A/hhblits/t000.1e-3.id90cov50.a3m: No such file or directory Running HHblits against BFD with E-value cutoff 1e-3

grep: 7qxr/A/hhblits/t000.1e-3.bfd.id90cov75.a3m: No such file or directory grep: 7qxr/A/hhblits/t000.1e-3.bfd.id90cov50.a3m: No such file or directory cp: cannot stat '7qxr/A/hhblits/t000.1e-3.bfd.id90cov50.a3m': No such file or directory Running PSIPRED Running hhsearch cat: 7qxr/A/t000.msa0.a3m: No such file or directory

Error executing job with overrides: [] Traceback (most recent call last): File "/data/home/ymnoh/RoseTTAFold-All-Atom/RoseTTAFold-All-Atom/rf2aa/run_inference.py", line 206, in main runner.infer() File "/data/home/ymnoh/RoseTTAFold-All-Atom/RoseTTAFold-All-Atom/rf2aa/run_inference.py", line 153, in infer self.parse_inference_config() File "/data/home/ymnoh/RoseTTAFold-All-Atom/RoseTTAFold-All-Atom/rf2aa/run_inference.py", line 46, in parse_inference_config protein_input = generate_msa_and_load_protein( File "/data/home/ymnoh/RoseTTAFold-All-Atom/RoseTTAFold-All-Atom/rf2aa/data/protein.py", line 93, in generate_msa_and_load_protein return load_protein(str(msa_file), str(hhr_file), str(atab_file), model_runner) File "/data/home/ymnoh/RoseTTAFold-All-Atom/RoseTTAFold-All-Atom/rf2aa/data/protein.py", line 56, in load_protein msa, ins, taxIDs = parse_a3m(msa_file) File "/data/home/ymnoh/RoseTTAFold-All-Atom/RoseTTAFold-All-Atom/rf2aa/data/parsers.py", line 415, in parsea3m fstream = open(filename, 'r') FileNotFoundError: [Errno 2] No such file or directory: '7qxr/A/t000.msa0.a3m'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace. (RFAA) ymnoh@ubuntu:~/RoseTTAFold-All-Atom/RoseTTAFold-All-Atom$ vim make_msa.sh (RFAA) ymnoh@ubuntu:~/RoseTTAFold-All-Atom/RoseTTAFold-All-Atom$ python -m rf2aa.run_inference --config-name protein_sm /home/ymnoh/anaconda3/envs/RFAA/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'protein_sm': Defaults list is missing _self_. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information warnings.warn(msg, UserWarning) Using the cif atom ordering for TRP. ./make_msa.sh examples/protein/7qxr.fasta 7qxr/A 4 64 /data/resource/RFAA/pdb100_2021Mar03/pdb100_2021Mar03 Predicting: 100%|████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.84sequences/s] Running HHblits against UniRef30 with E-value cutoff 1e-10

grep: 7qxr/A/hhblits/t000.1e-10.id90cov75.a3m: No such file or directory grep: 7qxr/A/hhblits/t000.1e-10.id90cov50.a3m: No such file or directory Running HHblits against UniRef30 with E-value cutoff 1e-6

grep: 7qxr/A/hhblits/t000.1e-6.id90cov75.a3m: No such file or directory grep: 7qxr/A/hhblits/t000.1e-6.id90cov50.a3m: No such file or directory Running HHblits against UniRef30 with E-value cutoff 1e-3

grep: 7qxr/A/hhblits/t000.1e-3.id90cov75.a3m: No such file or directory grep: 7qxr/A/hhblits/t000.1e-3.id90cov50.a3m: No such file or directory Running HHblits against BFD with E-value cutoff 1e-3

grep: 7qxr/A/hhblits/t000.1e-3.bfd.id90cov75.a3m: No such file or directory grep: 7qxr/A/hhblits/t000.1e-3.bfd.id90cov50.a3m: No such file or directory cp: cannot stat '7qxr/A/hhblits/t000.1e-3.bfd.id90cov50.a3m': No such file or directory Running PSIPRED Running hhsearch cat: 7qxr/A/t000.msa0.a3m: No such file or directory

Error executing job with overrides: [] Traceback (most recent call last): File "/data/home/ymnoh/RoseTTAFold-All-Atom/RoseTTAFold-All-Atom/rf2aa/run_inference.py", line 206, in main runner.infer() File "/data/home/ymnoh/RoseTTAFold-All-Atom/RoseTTAFold-All-Atom/rf2aa/run_inference.py", line 153, in infer self.parse_inference_config() File "/data/home/ymnoh/RoseTTAFold-All-Atom/RoseTTAFold-All-Atom/rf2aa/run_inference.py", line 46, in parse_inference_config protein_input = generate_msa_and_load_protein( File "/data/home/ymnoh/RoseTTAFold-All-Atom/RoseTTAFold-All-Atom/rf2aa/data/protein.py", line 93, in generate_msa_and_load_protein return load_protein(str(msa_file), str(hhr_file), str(atab_file), model_runner) File "/data/home/ymnoh/RoseTTAFold-All-Atom/RoseTTAFold-All-Atom/rf2aa/data/protein.py", line 56, in load_protein msa, ins, taxIDs = parse_a3m(msa_file) File "/data/home/ymnoh/RoseTTAFold-All-Atom/RoseTTAFold-All-Atom/rf2aa/data/parsers.py", line 415, in parsea3m fstream = open(filename, 'r') FileNotFoundError: [Errno 2] No such file or directory: '7qxr/A/t000.msa0.a3m'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.


Notable: Large files are located in the /data/resource/RFAA path.

4) Modified the database path in make_msa.sh.

image

I'm sure I've missed a lot since I'm a beginner. If you have a moment, could you take a look?

Thank you

noyoume commented 6 months ago

It was a path issue, fixed!

narubi2 commented 6 months ago

@knight-qs @xie-yun-ai 谢谢 I followed your comments, and now it works well. Thanks a lot

smilenaderi commented 6 months ago

@knight-qs @xie-yun-ai Thank you your solution helped me running examples. But I'm still getting that error for my sequences. Can you run it with other sequences?

noyoume commented 6 months ago

@smilenaderi hello, Protein-small molecule was capable of other data as well.

smilenaderi commented 6 months ago

@noyoume I'm trying different only proteins and usually it fails inference with this error: sequence ss_pred contains no residues I haven't tried other kind of inferences

smilenaderi commented 6 months ago


- 00:00:35.518 INFO: Output file = /home/ubuntu/RFAA/outputs/tunlorlk235803/tunlorlk235803/A/hhblits/t000_.1e-10.id90cov75.a3m

- 00:00:35.526 INFO: Input file = /home/ubuntu/RFAA/outputs/tunlorlk235803/tunlorlk235803/A/hhblits/t000_.1e-10.a3m

- 00:00:35.526 INFO: Output file = /home/ubuntu/RFAA/outputs/tunlorlk235803/tunlorlk235803/A/hhblits/t000_.1e-10.id90cov50.a3m

- 00:03:02.448 INFO: Input file = /home/ubuntu/RFAA/outputs/tunlorlk235803/tunlorlk235803/A/hhblits/t000_.1e-6.a3m

- 00:03:02.448 INFO: Output file = /home/ubuntu/RFAA/outputs/tunlorlk235803/tunlorlk235803/A/hhblits/t000_.1e-6.id90cov75.a3m

- 00:03:02.458 INFO: Input file = /home/ubuntu/RFAA/outputs/tunlorlk235803/tunlorlk235803/A/hhblits/t000_.1e-6.a3m

- 00:03:02.458 INFO: Output file = /home/ubuntu/RFAA/outputs/tunlorlk235803/tunlorlk235803/A/hhblits/t000_.1e-6.id90cov50.a3m

- 00:05:50.208 INFO: Input file = /home/ubuntu/RFAA/outputs/tunlorlk235803/tunlorlk235803/A/hhblits/t000_.1e-3.a3m

- 00:05:50.208 INFO: Output file = /home/ubuntu/RFAA/outputs/tunlorlk235803/tunlorlk235803/A/hhblits/t000_.1e-3.id90cov75.a3m

- 00:05:50.508 INFO: Input file = /home/ubuntu/RFAA/outputs/tunlorlk235803/tunlorlk235803/A/hhblits/t000_.1e-3.a3m

- 00:05:50.508 INFO: Output file = /home/ubuntu/RFAA/outputs/tunlorlk235803/tunlorlk235803/A/hhblits/t000_.1e-3.id90cov50.a3m

- 00:06:06.852 ERROR: Could find neither hhm_db nor a3m_db!

- 00:06:07.043 INFO: Input file = /home/ubuntu/RFAA/outputs/tunlorlk235803/tunlorlk235803/A/hhblits/t000_.1e-3.bfd.a3m

- 00:06:07.043 INFO: Output file = /home/ubuntu/RFAA/outputs/tunlorlk235803/tunlorlk235803/A/hhblits/t000_.1e-3.bfd.id90cov75.a3m

- 00:06:07.043 ERROR: In /opt/conda/conda-bld/hhsuite_1709621322429/work/src/hhfilter.cpp:177: main:

- 00:06:07.043 ERROR:   could not open file '/home/ubuntu/RFAA/outputs/tunlorlk235803/tunlorlk235803/A/hhblits/t000_.1e-3.bfd.a3m'

- 00:06:07.044 INFO: Input file = /home/ubuntu/RFAA/outputs/tunlorlk235803/tunlorlk235803/A/hhblits/t000_.1e-3.bfd.a3m

- 00:06:07.045 INFO: Output file = /home/ubuntu/RFAA/outputs/tunlorlk235803/tunlorlk235803/A/hhblits/t000_.1e-3.bfd.id90cov50.a3m

- 00:06:07.045 ERROR: In /opt/conda/conda-bld/hhsuite_1709621322429/work/src/hhfilter.cpp:177: main:

- 00:06:07.045 ERROR:   could not open file '/home/ubuntu/RFAA/outputs/tunlorlk235803/tunlorlk235803/A/hhblits/t000_.1e-3.bfd.a3m'

grep: /home/ubuntu/RFAA/outputs/tunlorlk235803/tunlorlk235803/A/hhblits/t000_.1e-3.bfd.id90cov75.a3m: No such file or directory
grep: /home/ubuntu/RFAA/outputs/tunlorlk235803/tunlorlk235803/A/hhblits/t000_.1e-3.bfd.id90cov50.a3m: No such file or directory
cp: cannot stat '/home/ubuntu/RFAA/outputs/tunlorlk235803/tunlorlk235803/A/hhblits/t000_.1e-3.bfd.id90cov50.a3m': No such file or directory
cat: /home/ubuntu/RFAA/outputs/tunlorlk235803/tunlorlk235803/A/t000_.msa0.a3m: No such file or directory
- 00:06:07.203 ERROR: In /opt/conda/conda-bld/hhsuite_1709621322429/work/src/hhalignment.cpp:223: Read:

- 00:06:07.203 ERROR:   sequence ss_pred contains no residues.

Error executing job with overrides: []
Traceback (most recent call last):
  File "/home/ubuntu/RFAA/RoseTTAFold-All-Atom/rf2aa/run_inference.py", line 206, in main
    runner.infer()
  File "/home/ubuntu/RFAA/RoseTTAFold-All-Atom/rf2aa/run_inference.py", line 153, in infer
    self.parse_inference_config()
  File "/home/ubuntu/RFAA/RoseTTAFold-All-Atom/rf2aa/run_inference.py", line 46, in parse_inference_config
    protein_input = generate_msa_and_load_protein(
  File "/home/ubuntu/RFAA/RoseTTAFold-All-Atom/rf2aa/data/protein.py", line 93, in generate_msa_and_load_protein
    return load_protein(str(msa_file), str(hhr_file), str(atab_file), model_runner)
  File "/home/ubuntu/RFAA/RoseTTAFold-All-Atom/rf2aa/data/protein.py", line 56, in load_protein
    msa, ins, taxIDs = parse_a3m(msa_file)
  File "/home/ubuntu/RFAA/RoseTTAFold-All-Atom/rf2aa/data/parsers.py", line 415, in parse_a3m
    fstream = open(filename, 'r')
FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/RFAA/outputs/tunlorlk235803/tunlorlk235803/A/t000_.msa0.a3m'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.```
ChengQin0 commented 5 months ago

hello, I have a question, the RFAA relying on gpu or cpu? I have to config gpu for it?

YaoYinYing commented 4 months ago

hello, I have a question, the RFAA relying on gpu or cpu? I have to config gpu for it?

You can slightly change the inference script to force rf2aa using CPU. See this: https://github.com/YaoYinYing/RoseTTAFold-All-Atom/commit/b7871606318735c96a0a9cd5c8dc4280d5516707

Yang-Wang-2020 commented 4 months ago

May I ask how long did this job take you to finish? What is your computer hardware?

I test python -m rf2aa.run_inference --config-name protein_sm. It seems to work well.

image

image