KosinskiLab / AlphaPulldownSnakemake

GNU General Public License v3.0
3 stars 0 forks source link

Error #17

Open salomonssonj opened 2 months ago

salomonssonj commented 2 months ago

Hi,

I am encountering some errors when running this snakemake pipeline. I have attached a log file from one of my runs.

6416819.txt

I am able to generate the predictions and get the “ranked_x.pdb” files but then I get the error when trying to create the “completed_fold.txt” file. From the log file it seems to be related to the “convert_to_modelcif.py” script.

However, the predictions have rather bad scores, even for once we previously obtained high scoring predictions for.

It also says “/dev/null” after the input line, which I have not seen before. Could this be related?

Thank you in advance!

Kindly, Johannes

maurerv commented 2 months ago

Hi Johannes,

Thank you for reporting. Indeed, it seems like an issue in AlphaPulldown and not the pipeline. I assigned @DimaMolod to your issue because he implemented convert_to_modelcif.py in AlphaPulldown and can best address this.

Regarding snakemake, the /dev/null after the input is an implementation detail to enable job clustering, but it has no effect here. You should also not see a difference in prediction scores between using the snakemake pipeline and AlphaPulldown, because the prediction procedure is the same. Have you compared the output of the pipeline to the most recent version of AlphaPulldown from the main branch or a previous version? If prediction scores from the main branch AP version deviate strongly from previous AP versions, it would be best to open an issue here: https://github.com/KosinskiLab/AlphaPulldown/issues.

Best, Valentin

salomonssonj commented 2 months ago

Hi Valentin,

Thank you for your input.

For some input combinations, I also get errors related to template_confidence_scores (see attachment). However, I have manage to successfully obtain predictions using the same features. For example, I get the error when my input is "O75112-6+Q15124:1-195" (as in the attached log file), but for "O75112-6:1-84+Q14315:2036-2406" it works.

6459736.txt

Kindly, Johannes

dingquanyu commented 2 months ago

Hi Valentin,

Thank you for your input.

For some input combinations, I also get errors related to template_confidence_scores (see attachment). However, I have manage to successfully obtain predictions using the same features. For example, I get the error when my input is "O75112-6+Q15124:1-195" (as in the attached log file), but for "O75112-6:1-84+Q14315:2036-2406" it works.

6459736.txt

Kindly, Johannes

Hi @salomonssonj

This KeyError: 'template_confidence_scores' indicated that you might have created one of the feature pickles using mmseqs2 under older version of AlphPulldown? Previously, it would cause this error when mmseqs2 had some problems finding the structural templates and it has been fixed in later AlphaPulldown versions. If so, could you remove the corresponding pickles rerun the feature creation steps, using the newer versions.

Yours Dingquan

salomonssonj commented 2 months ago

Hi Dingquan,

Thank you for your response.

I generated these features at the beginning this week using the snakemake pipeline. I re-generated them yesterday, also using the snakemake pipeline, in case something went wrong the first time. I have attached the log-file for one of the features I generated at the beginning of this week.

I cleared the AlphaPulldownSnakemake/.snakemake/singularity directory last friday to make sure I had the latest singularity images, in case that is helpful.

6413228.txt

Kindly, Johannes

dingquanyu commented 2 months ago

Hi Johannes,

Thanks for the updates. It's really strange that your pickle still doesn't have template_confidence_scores, and the current version of AlphaPulldown makes sure every pickle should contain this value, as in here: https://github.com/KosinskiLab/AlphaPulldown/blob/fe456e610f337838fff820d889293a2ead99ef14/alphapulldown/objects.py#L155-L163 I really cannot think of a solution except manually reading in the pickle that caused you this problem. Then, within the feat_dict attribute, you manually add the template_confidence_scores to be [1]*num_of_residues and save the pickle.

Yours Dingquan

jkosinski commented 2 months ago

Hi Johannes, could you send Dingquan your pkl files for an example job that crashed to Dingquan? @dingquanyu just to double check the template_confidence_scores are not there.

salomonssonj commented 2 months ago

Hi Dingquan and Jan,

I have attached one of my .pkl files. Q15124.pkl.zip

Thank you for your help.

Kindly, Johannes

jkosinski commented 2 months ago

Could you also send the second from the pair?

salomonssonj commented 2 months ago

Yes, sorry. There is the second pkl file

O75112-6.pkl.zip

DimaMolod commented 2 months ago

I think both pkl files do have 'template_confidence_scores' and 'template_release_date', so the problem must be somewhere else. I will try to reproduce the error using the provided pkl files

DimaMolod commented 2 months ago

I managed to reproduce the error. The problem occurs only for the ChoppedObjects because the new keys 'template_confidence_scores' and 'template_release_date' are lost after this function is called: https://github.com/KosinskiLab/AlphaPulldown/blob/fe456e610f337838fff820d889293a2ead99ef14/alphapulldown/objects.py#L323-L354

DimaMolod commented 2 months ago

This issue should be fixed in the new version. @salomonssonj please update the images and let us know if it works for you now. Thanks again for reporting this; that was a big and well-hidden bug!

salomonssonj commented 2 months ago

Thank you for looking in to it! I'll let you know when I have tried with the updates images.

salomonssonj commented 2 months ago

Hi,

I tried to run some predictions this morning with the updates images but it once again failed. 6618594.txt

dingquanyu commented 2 months ago

Hi,

I tried to run some predictions this morning with the updates images but it once again failed. 6618594.txt

Hi,

This error was caused by using jax version higher or equal to 0.4.24. In jax version 0.4.23, jax has this module but in the later versions, they deprecated it. However, in AlphaPulldown's dockerfile, jax0.4.23 is specified, meaning it should be fine. Could you check the jax version inside your container? @DimaMolod @maurerv if you have time, could you check the jax version inside the container as well?

Yours Dingquan

salomonssonj commented 2 months ago

Hi,

Thank you, inside the singularity image b1a0408b77e6fc0b904c69cd981fb35c.simg I have the jax0.4.30 version, and in 3f20617ccba864758b2a437ef2fde35c.simg 0.4.16.

dingquanyu commented 2 months ago

I see. Could you try again using the image with 0.4.16 version?

DimaMolod commented 2 months ago

I think these two images correspond to the pulldown.docker and analysis.docker. At the same time pulldown.docker has version 0.4.23 specified: https://github.com/KosinskiLab/AlphaPulldown/blob/main/docker/pulldown.dockerfile#L77-L78 and for analysis image, is it 0.4.16? https://github.com/KosinskiLab/AlphaPulldown/blob/main/alphapulldown/analysis_pipeline/Dockerfile#L35 Something is wrong with the images definitions

salomonssonj commented 2 months ago

Yes, the image with jax 0.4.16 is from kosinskilab/fold_analysis:latest

DimaMolod commented 2 months ago

thank you, @salomonssonj, we are working on that issue and let you know once it's fixed!

salomonssonj commented 2 months ago

Great, thank you very much!

dingquanyu commented 2 months ago

Hi, I tried to run some predictions this morning with the updates images but it once again failed. 6618594.txt

Hi,

This error was caused by using jax version higher or equal to 0.4.24. In jax version 0.4.23, jax has this module but in the later versions, they deprecated it. However, in AlphaPulldown's dockerfile, jax0.4.23 is specified, meaning it should be fine. Could you check the jax version inside your container? @DimaMolod @maurerv if you have time, could you check the jax version inside the container as well?

Yours Dingquan

Hi @salomonssonj
I checked the current version of the docker image from the hub and the jax version now is correct. I think @DimaMolod had already tried the image to model the structures of the given pickles and the key error was solved. Could you try again pls?

Yours Dingquan

DimaMolod commented 2 months ago

For me, features and predictions are created, but the reports crash. I think 'compute_stats' rule always fails due to this error:

rule compute_stats:
    input: /scratch/dima/fold_temp/predictions/Q8I2G6_Q8I5K4/completed_fold.txt
    output: /scratch/dima/fold_temp/predictions/Q8I2G6_Q8I5K4/statistics.csv
    jobid: 0
    reason: Forced execution
    wildcards: fold=Q8I2G6_Q8I5K4
    resources: mem_mb=8000, mem_mib=7630, disk_mb=1000, disk_mib=954, tmpdir=/scratch/jobs/6960313, walltime=1440, attempt=1

Activating singularity image /g/kosinski/dima/SnakeMake/AlphaPulldownSnakemake/.snakemake/singularity/3f20617ccba864758b2a437ef2fde35c.simg
WARNING: Could not find any nv files on this host!
I0713 09:26:31.047352 140737350492992 get_good_inter_pae.py:120] now processing Q8I2G6_Q8I5K4
E0713 09:26:31.510183 140737350492992 get_good_inter_pae.py:156] Error processing PAE and iPTM for job Q8I2G6_Q8I5K4: No module named 'alphafold'
I0713 09:26:31.510915 140737350492992 get_good_inter_pae.py:166] done for Q8I2G6_Q8I5K4 1 out of 1 finished.
I0713 09:26:31.510968 140737350492992 get_good_inter_pae.py:169] Unfortunately, none of your protein models had at least one PAE on the interface below your cu
toff value : 100.0.
 Please consider using a larger cutoff.
[Sat Jul 13 09:26:34 2024]
Finished job 0.
1 of 1 steps (100%) done
Traceback (most recent call last):
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/weakref.py", line 667, in _exitfunc
    f()
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/weakref.py", line 591, in __call__
    return info.func(*info.args, **(info.kwargs or {}))
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/tempfile.py", line 868, in _cleanup
    cls._rmtree(name, ignore_errors=ignore_errors)
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/tempfile.py", line 864, in _rmtree
    _shutil.rmtree(name, onerror=onerror)
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/shutil.py", line 725, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/shutil.py", line 658, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/shutil.py", line 658, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/shutil.py", line 658, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  [Previous line repeated 3 more times]
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/shutil.py", line 664, in _rmtree_safe_fd
    onerror(os.rmdir, fullname, sys.exc_info())
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/shutil.py", line 662, in _rmtree_safe_fd
    os.rmdir(entry.name, dir_fd=topfd)
OSError: [Errno 39] Directory not empty: 'envs'

...which seems to be related to create_notebook.py script from AlphaPulldown and might be also related to this issue: https://github.com/KosinskiLab/AlphaPulldown/issues/379 @dingquanyu, could you check if create_notebook.py actually creates anything?

salomonssonj commented 1 month ago

Hi Dingquan and Dima,

I ran the pipeline using features I created previously and manage to get the predictions. However, I also get the same errors above that the report crashes.

dingquanyu commented 1 month ago

For me, features and predictions are created, but the reports crash. I think 'compute_stats' rule always fails due to this error:

rule compute_stats:
    input: /scratch/dima/fold_temp/predictions/Q8I2G6_Q8I5K4/completed_fold.txt
    output: /scratch/dima/fold_temp/predictions/Q8I2G6_Q8I5K4/statistics.csv
    jobid: 0
    reason: Forced execution
    wildcards: fold=Q8I2G6_Q8I5K4
    resources: mem_mb=8000, mem_mib=7630, disk_mb=1000, disk_mib=954, tmpdir=/scratch/jobs/6960313, walltime=1440, attempt=1

Activating singularity image /g/kosinski/dima/SnakeMake/AlphaPulldownSnakemake/.snakemake/singularity/3f20617ccba864758b2a437ef2fde35c.simg
WARNING: Could not find any nv files on this host!
I0713 09:26:31.047352 140737350492992 get_good_inter_pae.py:120] now processing Q8I2G6_Q8I5K4
E0713 09:26:31.510183 140737350492992 get_good_inter_pae.py:156] Error processing PAE and iPTM for job Q8I2G6_Q8I5K4: No module named 'alphafold'
I0713 09:26:31.510915 140737350492992 get_good_inter_pae.py:166] done for Q8I2G6_Q8I5K4 1 out of 1 finished.
I0713 09:26:31.510968 140737350492992 get_good_inter_pae.py:169] Unfortunately, none of your protein models had at least one PAE on the interface below your cu
toff value : 100.0.
 Please consider using a larger cutoff.
[Sat Jul 13 09:26:34 2024]
Finished job 0.
1 of 1 steps (100%) done
Traceback (most recent call last):
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/weakref.py", line 667, in _exitfunc
    f()
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/weakref.py", line 591, in __call__
    return info.func(*info.args, **(info.kwargs or {}))
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/tempfile.py", line 868, in _cleanup
    cls._rmtree(name, ignore_errors=ignore_errors)
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/tempfile.py", line 864, in _rmtree
    _shutil.rmtree(name, onerror=onerror)
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/shutil.py", line 725, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/shutil.py", line 658, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/shutil.py", line 658, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/shutil.py", line 658, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  [Previous line repeated 3 more times]
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/shutil.py", line 664, in _rmtree_safe_fd
    onerror(os.rmdir, fullname, sys.exc_info())
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/shutil.py", line 662, in _rmtree_safe_fd
    os.rmdir(entry.name, dir_fd=topfd)
OSError: [Errno 39] Directory not empty: 'envs'

...which seems to be related to create_notebook.py script from AlphaPulldown and might be also related to this issue: KosinskiLab/AlphaPulldown#379 @dingquanyu, could you check if create_notebook.py actually creates anything?

KosinskiLab/AlphaPulldown#379 was caused when the user runs the script to analyse colabfold local results.

dingquanyu commented 1 month ago

For me, features and predictions are created, but the reports crash. I think 'compute_stats' rule always fails due to this error:

rule compute_stats:
    input: /scratch/dima/fold_temp/predictions/Q8I2G6_Q8I5K4/completed_fold.txt
    output: /scratch/dima/fold_temp/predictions/Q8I2G6_Q8I5K4/statistics.csv
    jobid: 0
    reason: Forced execution
    wildcards: fold=Q8I2G6_Q8I5K4
    resources: mem_mb=8000, mem_mib=7630, disk_mb=1000, disk_mib=954, tmpdir=/scratch/jobs/6960313, walltime=1440, attempt=1

Activating singularity image /g/kosinski/dima/SnakeMake/AlphaPulldownSnakemake/.snakemake/singularity/3f20617ccba864758b2a437ef2fde35c.simg
WARNING: Could not find any nv files on this host!
I0713 09:26:31.047352 140737350492992 get_good_inter_pae.py:120] now processing Q8I2G6_Q8I5K4
E0713 09:26:31.510183 140737350492992 get_good_inter_pae.py:156] Error processing PAE and iPTM for job Q8I2G6_Q8I5K4: No module named 'alphafold'
I0713 09:26:31.510915 140737350492992 get_good_inter_pae.py:166] done for Q8I2G6_Q8I5K4 1 out of 1 finished.
I0713 09:26:31.510968 140737350492992 get_good_inter_pae.py:169] Unfortunately, none of your protein models had at least one PAE on the interface below your cu
toff value : 100.0.
 Please consider using a larger cutoff.
[Sat Jul 13 09:26:34 2024]
Finished job 0.
1 of 1 steps (100%) done
Traceback (most recent call last):
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/weakref.py", line 667, in _exitfunc
    f()
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/weakref.py", line 591, in __call__
    return info.func(*info.args, **(info.kwargs or {}))
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/tempfile.py", line 868, in _cleanup
    cls._rmtree(name, ignore_errors=ignore_errors)
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/tempfile.py", line 864, in _rmtree
    _shutil.rmtree(name, onerror=onerror)
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/shutil.py", line 725, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/shutil.py", line 658, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/shutil.py", line 658, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/shutil.py", line 658, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  [Previous line repeated 3 more times]
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/shutil.py", line 664, in _rmtree_safe_fd
    onerror(os.rmdir, fullname, sys.exc_info())
  File "/home/dmolodenskiy/.conda/envs/sm310/lib/python3.10/shutil.py", line 662, in _rmtree_safe_fd
    os.rmdir(entry.name, dir_fd=topfd)
OSError: [Errno 39] Directory not empty: 'envs'

...which seems to be related to create_notebook.py script from AlphaPulldown and might be also related to this issue: KosinskiLab/AlphaPulldown#379 @dingquanyu, could you check if create_notebook.py actually creates anything?

This line reports the real crash here: OSError: [Errno 39] Directory not empty: 'envs' I don't know where it comes from? I think it's from snakemake itself as there's no step of removing directories in the script @DimaMolod

DimaMolod commented 1 month ago

Hi @salomonssonj, and sorry for the long delay. I think we fixed the problem, and now Snakemake works for me for the test data sets, including the generation of reports. Please update the images and re-run your modeling again. Many thanks!

salomonssonj commented 1 month ago

Hi, I also apologize for my delayed reply.

I tried it last week with the new singularity images and it seems that I still get some errors with the converting to modelcif format. I have attached logs files for generate_report and structure_inference. compute_stats-7749922.txt structure_inference-7737278.txt

I also wanted to try it out this morning but got this error message when executing the pipeline: snakemake: error: ambiguous option: --cluster=/home/salomonssonj/.config/snakemake/slurm_noSidecar/slurm-submit.py could match --cluster-generic-submit-cmd, --cluster-generic-status-cmd, --cluster-generic-cancel-cmd, --cluster-generic-cancel-nargs, --cluster-generic-sidecar-cmd, --cluster-sync-submit-cmd

I tried with adding --cluster-generic-submit-cmd home/salomonssonj/.config/snakemake/slurm_noSidecar/slurm-submit.py but I still got the same error.

DimaMolod commented 1 month ago

Hi @salomonssonj, and thanks again for your feedback and patience :-) Please try again with the fresh containers, the modelcif error should go away (anyway this issue is not critical and shouldn't prevent the execution of the main pipeline). The error you encounter with the --cluster flag probably indicate that you are using an outdated version of the snakemake. Could you try to update your snakemake e.g. to version 7.32.4? I also updated the instructions: for me it works after this single command: https://github.com/KosinskiLab/AlphaPulldownSnakemake/blob/main/README.md?plain=1#L14 Please try it out and let me know if the error disappeared.