DessimozLab / fold_tree

snakemake pipeline for creating trees from sequence sets
MIT License
69 stars 5 forks source link

FoldTree crushing in "Run FoldTree" step #11

Open Nitayah opened 11 months ago

Nitayah commented 11 months ago

Hi! I am trying to run FoldTree with a zipped folder with ~60 pdb files. Here is my folder:

all_pdb_files_18.12.23.zip

The code crashes in "Run Foldtree" branch and returns the following error. I attach it both as an image and as text.

Could you please help me?

image

[Mon Dec 18 12:48:32 2023] rule dl_ids_sequences: input: ./test_40814_3/identifiers.txt output: ./test_40814_3/sequence_dataset.csv log: ./test_40814_3/logs/dlsequences.log jobid: 3 reason: Missing output files: ./test_40814_3/sequence_dataset.csv wildcards: folder=./test_40814_3 resources: tmpdir=/tmp

Activating conda environment: foldtree

EnvironmentNameNotFound: Could not find conda environment: foldtree You can list all discoverable environments with conda info --envs.

[Mon Dec 18 12:48:37 2023] Finished job 3. 1 of 15 steps (7%) done Select jobs to execute...

[Mon Dec 18 12:48:37 2023] rule dl_ids_structs: input: ./test_40814_3/sequence_dataset.csv output: ./test_40814_3/sequences.fst, ./test_40814_3/finalset.csv log: ./test_40814_3/logs/dlstructs.log jobid: 2 reason: Missing output files: ./test_40814_3/finalset.csv; Input files updated by another job: ./test_40814_3/sequence_dataset.csv wildcards: folder=./test_40814_3 resources: tmpdir=/tmp

Activating conda environment: foldtree

EnvironmentNameNotFound: Could not find conda environment: foldtree You can list all discoverable environments with conda info --envs.

[Mon Dec 18 12:48:39 2023] Finished job 2. 2 of 15 steps (13%) done Select jobs to execute...

[Mon Dec 18 12:48:40 2023] rule plddt: input: ./test_40814_3/finalset.csv output: ./test_40814_3/plddt.json log: ./test_40814_3/logs/plddt.log jobid: 1 reason: Missing output files: ./test_40814_3/plddt.json; Input files updated by another job: ./test_40814_3/finalset.csv wildcards: folder=./test_40814_3 resources: tmpdir=/tmp

Activating conda environment: foldtree

[Mon Dec 18 12:48:40 2023] rule foldseek_allvall_1: input: ./test_40814_3/finalset.csv output: ./test_40814_3/allvall_1.csv log: ./test_40814_3/logs/foldseekallvall.log jobid: 8 reason: Missing output files: ./test_40814_3/allvall_1.csv; Input files updated by another job: ./test_40814_3/finalset.csv wildcards: folder=./test_40814_3 resources: tmpdir=/tmp

Activating conda environment: foldtree

EnvironmentNameNotFound: Could not find conda environment: foldtree You can list all discoverable environments with conda info --envs.

EnvironmentNameNotFound: Could not find conda environment: foldtree You can list all discoverable environments with conda info --envs.

[Mon Dec 18 12:48:40 2023] Finished job 8. 3 of 15 steps (20%) done Select jobs to execute...

[Mon Dec 18 12:48:40 2023] rule foldseek2distmat: input: ./test_40814_3/allvall_1.csv output: ./test_40814_3/foldtree_fastmemat.txt, ./test_40814_3/alntmscore_fastmemat.txt, ./test_40814_3/lddt_fastmemat.txt log: ./test_40814_3/logs/foldseek2distmat.log jobid: 7 reason: Missing output files: ./test_40814_3/lddt_fastmemat.txt, ./test_40814_3/foldtree_fastmemat.txt, ./test_40814_3/alntmscore_fastmemat.txt; Input files updated by another job: ./test_40814_3/allvall_1.csv wildcards: folder=./test_40814_3 resources: tmpdir=/tmp

Activating conda environment: foldtree

EnvironmentNameNotFound: Could not find conda environment: foldtree You can list all discoverable environments with conda info --envs.

[Mon Dec 18 12:48:42 2023] Finished job 1. 4 of 15 steps (27%) done Traceback (most recent call last): File "/content/.snakemake/scripts/tmp0mskjfj9.foldseekres2distmat_simple.py", line 9, in res = pd.read_table(snakemake.input[0], header = None) File "/usr/local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1282, in read_table return _read(filepath_or_buffer, kwds) File "/usr/local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 611, in _read parser = TextFileReader(filepath_or_buffer, kwds) File "/usr/local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1448, in init self._engine = self._make_engine(f, self.engine) File "/usr/local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1723, in _make_engine return mapping[engine](f, self.options) File "/usr/local/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 93, in init self._reader = parsers.TextReader(src, **kwds) File "parsers.pyx", line 586, in pandas._libs.parsers.TextReader.cinit pandas.errors.EmptyDataError: No columns to parse from file [Mon Dec 18 12:48:43 2023] Error in rule foldseek2distmat: jobid: 7 input: ./test_40814_3/allvall_1.csv output: ./test_40814_3/foldtree_fastmemat.txt, ./test_40814_3/alntmscore_fastmemat.txt, ./test_40814_3/lddt_fastmemat.txt log: ./test_40814_3/logs/foldseek2distmat.log (check log file(s) for error details) conda-env: foldtree

RuleException: CalledProcessError in file /content/fold_tree/workflow/fold_tree, line 90: Command 'source /usr/local/bin/activate 'foldtree'; set -euo pipefail; /usr/local/bin/python3.10 /content/.snakemake/scripts/tmp0mskjfj9.foldseekres2distmat_simple.py' returned non-zero exit status 1. File "/content/fold_tree/workflow/fold_tree", line 90, in __rule_foldseek2distmat File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2023-12-18T124832.269382.snakemake.log

CalledProcessError Traceback (most recent call last) in <cell line: 1>() ----> 1 get_ipython().run_cell_magic('bash', '-s $jobname $input_type', 'JOBNAME=$1\nINPUT_TYPE=$2\nSUFFIX=""\nif [[ $INPUT_TYPE = "custom" ]]; then\n mkdir -p "${JOBNAME}/structs"\n mv "${JOBNAME}/".pdb "${JOBNAME}/".cif "${JOBNAME}/structs"\n SUFFIX="custom_structs=True"\nfi\nsnakemake --cores $(nproc --all) --use-conda -s fold_tree/workflow/fold_tree --config folder="./${JOBNAME}" filter=False $SUFFIX #> /dev/null 2>&1\n#snakemake --cores 4 --use-conda -s fold_tree/workflow/fold_tree --config folder=./${jobname} filter=False\n')

4 frames

in shebang(self, line, cell) [/usr/local/lib/python3.10/dist-packages/IPython/core/magics/script.py](https://localhost:8080/#) in shebang(self, line, cell) 243 sys.stderr.flush() 244 if args.raise_error and p.returncode!=0: --> 245 raise CalledProcessError(p.returncode, cell, output=out, stderr=err) 246 247 def _run_script(self, p, cell, to_close): CalledProcessError: Command 'b'JOBNAME=$1\nINPUT_TYPE=$2\nSUFFIX=""\nif [[ $INPUT_TYPE = "custom" ]]; then\n mkdir -p "${JOBNAME}/structs"\n mv "${JOBNAME}/"*.pdb "${JOBNAME}/"*.cif "${JOBNAME}/structs"\n SUFFIX="custom_structs=True"\nfi\nsnakemake --cores $(nproc --all) --use-conda -s fold_tree/workflow/fold_tree --config folder="./${JOBNAME}" filter=False $SUFFIX #> /dev/null 2>&1\n#snakemake --cores 4 --use-conda -s fold_tree/workflow/fold_tree --config folder=./${jobname} filter=False\n'' returned non-zero exit status 1.
ruthalee commented 10 months ago

I am having the same problem with my custom run. There are closed issues that are similar to this (#3, #6), but I am not sure that those solutions apply here, as they would have been fixed. When I clicked the files tab on the left hand side, my struct folders were empty and the indentifiers.txt files said "1". Thanks for your help!

Input: Zipped folder containing >100 pdb files.

input

Error Message:

Screenshot 2024-01-23 at 6 03 24 PM
cactuskid commented 10 months ago

this may have been due to using an outdated version of foldseek on the backend. I've changed this parameter on the configfile so it might be working now.

msleutel commented 7 months ago

The problem persists for me (tested on 11/04/2024). I get the same error for an "identifier" run as well as a "custom" run

image

trinicordero commented 6 months ago

I am having the same problem (17/05/24) :( pleaseeee help!