Closed naailkhan28 closed 9 months ago
Thanks for letting us know about this! We've seen this occur occasionally with other analyses as well and haven't been able to ID the cause. We'll look into it.
One hypothesis that @mertcelebi shared is that this might occur when the timestamp of the FASTA file is later than that of the PDB file. If so, it's possible that this is a more general issue with how Snakemake detects files. Could you try running this code from within the demo/1OHG/
directory to see if it resolves the issue?
touch P49861.fasta &&
sleep .5 &&
touch P49861.pdb
Separately from this issue, the experimentally-determined PDB provided appears to contain multiple chains, which unfortunately isn't supported yet by the pipeline. We're working to resolve the current issue, but I expect you'll run into a new error once the replacement issue is resolved.
I just did the experiment you suggested and that seemed to be the issue - when the FASTA is older than the PDB file then it works as expected and no make_pdb
job is run.
However, if you swap those two touch
calls (ie, the PDB is older than the FASTA) or you run them at the same time (with no sleep .5
command) then the make_pdb
job is run. Looks like a tweak to the snakemake pipeline is needed!
Separately from this issue, the experimentally-determined PDB provided appears to contain multiple chains, which unfortunately isn't supported yet by the pipeline. We're working to resolve the current issue, but I expect you'll run into a new error once the replacement issue is resolved.
Before you implement a way of handling multiple chains, would it be helpful to handle multi-chain PDB files in a nicer way? Like maybe automatically taking chain A or throwing a warning or something.
That's a helpful suggestion; we can explore whether an intermediate version like this would help.
@naailkhan28 I think we've fixed this in #83, which should ensure that make_pdb
is never called for PDB files that are in the input directory. Please let us know if this resolves this issue on your end!
Just tested the latest version, looks fixed! Great work guys :)
Description of the bug
When running the Search mode Snakemake pipeline, I see that the
make_pdb
rule is run and a request is made to the ESMFold API, even when I've provided a .pdb file containing my input structure. My experimentally solved crystal structure is being replaced with a less accurate ESMFold prediction which is not ideal.I've attached my full Snakemake log (it failed due to a python error but ignore this), and my input .yml config, .pdb and .fasta files.
Command used and terminal output
Relevant files
P49861_fasta_pdb_yml_config_snakemake_log.zip
System information
VM.Standard.E4.Flex
instance in Oracle Cloud - 64 cores, 1024 GB RAM, 25 TB storage available