Closed Htrivett closed 1 year ago
Hi @Htrivett , Sorry to hear you are running into issues. The snakemake log is unfortunately not very informative.
Could you attach or copy/paste the log files for these rules located in the /logs
folder? This would be steim2.ConcoctAnalysis.log
and steim2.MaxbinCleanup.log
.
hopefully this is more use steim2.ConcoctAnalysis.log I think i have solved the maxbin clean up error as a specific .py was not downloaded correctly
Hi @Htrivett ,
This is starting to help, but Concoct seems to keep the relevant info in a separate log. This file is probably named /pub65/hannaht/HiFi-MAG-Pipeline/3-concoct/steim2/log.txt
. Would you be able to send this too? Thanks!
It looks like the files are deleted:
Removing output files of failed job ConcoctAnalysis since they might be corrupted: /pub65/hannaht/HiFi-MAG-Pipeline/3-concoct/steim2 the only files in 3-concoct are steim2.contigs_cut.fasta steim2.coverage_table.tsv
I have run concoct independently using the command from the pipeline, and the same errors show. The log file only shows this- 023-02-01 20:40:55,469:INFO:root:Results created at /pub65/hannaht/HiFi-MAG-Pipeline/3-concoct/steim2_ 2023-02-01 20:42:05,900:INFO:root:Successfully loaded composition data. 2023-02-01 20:42:05,923:INFO:root:Successfully loaded coverage data. (concoct) hannaht@ada30:~/HiFi-MAG-Pipeline/3-concoct$ conda deactivate
Hi @Htrivett ,
Can you try re-unning the snakemake with the --ignore-incomplete
flag? It might hang on to those files if the job fails.
Hi @dportik I have rerun using --keep-incomplete, and the log file is the same as above: 2023-02-02 09:33:43,909:INFO:root:Results created at /pub65/hannaht/HiFi-MAG-Pipeline/3-concoct/steim2 2023-02-02 09:35:13,859:INFO:root:Successfully loaded composition data. 2023-02-02 09:35:13,879:INFO:root:Successfully loaded coverage data.
Hi @dportik I have rerun using --keep-incomplete, and the log file is the same as above: 2023-02-02 09:33:43,909:INFO:root:Results created at /pub65/hannaht/HiFi-MAG-Pipeline/3-concoct/steim2 2023-02-02 09:35:13,859:INFO:root:Successfully loaded composition data. 2023-02-02 09:35:13,879:INFO:root:Successfully loaded coverage data.
I have now fixed this- it linked to this issue with concoct: https://github.com/BinPro/CONCOCT/issues/322
I too had this issue, and can confirm that the fix from the issue that @Htrivett linked works:
Hi, I was able to workaround the issue by opening the validation.py file. In my case (concoct installed with conda) the file was in miniconda3/envs/concoct/lib/python3.10/site-packages/sklearn/utils/validation.py.
on line 1885 I changed the following:
feature_names = np.asarray(X.columns, dtype=object) to
feature_names = np.asarray(X.columns.astype(str), dtype=object)
The value error that you pasted said to do that in such case. It worked, got another error, but it is not connected to this one. Hope it helps!
Perhaps it could be fixed with a change in the python or sklearn version in the concoct.yml recipe?
Glad you were able to fix the issues locally. I am planning on phasing out concoct in the next version of HiFi-MAG-Pipeline, so these can be temporary solutions until then.
Hi @Htrivett and @EisenRa, The newest version of HiFi-MAG-Pipeline is now available. Binning is performed with MetaBat2 and SeminBin2. Concoct and MaxBin2 have been retired, so this should end the issue.
I would encourage you to use this new version (v2.0.0), as it implements a "completeness-aware" strategy that outperforms all other versions of this pipeline. You can find more details here.
I have been trying to run the HiFiMag pipeline and have a couple of issues going on-
I get the errors below:
Activating conda environment: .snakemake/conda/8a61a2499c720acf3f0ba73f317582ee [Wed Feb 1 16:30:08 2023] Error in rule MaxbinCleanup: jobid: 16 input: /pub65/hannaht/HiFi-MAG-Pipeline/3-maxbin/steim2.complete.txt, /pub65/hannaht/HiFi-MAG-Pipeline/steim2.summary output: /pub65/hannaht/HiFi-MAG-Pipeline/3-maxbin/steim2 log: /pub65/hannaht/HiFi-MAG-Pipeline/logs/steim2.MaxbinCleanup.log (check log file(s) for error details) conda-env: /pub65/hannaht/HiFi-MAG-Pipeline/.snakemake/conda/8a61a2499c720acf3f0ba73f317582ee shell: python scripts/Maxbin2-organize-outputs.py -s steim2 -o /pub65/hannaht/HiFi-MAG-Pipeline/3-maxbin/steim2 &> /pub65/hannaht/HiFi-MAG-Pipeline/logs/steim2.MaxbinCleanup.log (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
[Wed Feb 1 16:31:06 2023] Error in rule ConcoctAnalysis: jobid: 20 input: /pub65/hannaht/HiFi-MAG-Pipeline/3-concoct/steim2.contigs_cut.fasta, /pub65/hannaht/HiFi-MAG-Pipeline/3-concoct/steim2.coverage_table.tsv output: /pub65/hannaht/HiFi-MAG-Pipeline/3-concoct/steim2, /pub65/hannaht/HiFi-MAG-Pipeline/3-concoct/steim2/clusteringgt50000.csv log: /pub65/hannaht/HiFi-MAG-Pipeline/logs/steim2.ConcoctAnalysis.log (check log file(s) for error details) conda-env: /pub65/hannaht/HiFi-MAG-Pipeline/.snakemake/conda/060a7b6545e1fd9251e67bcc3fd3d555 shell: concoct --composition_file /pub65/hannaht/HiFi-MAG-Pipeline/3-concoct/steim2.contigs_cut.fasta --coverage_file /pub65/hannaht/HiFi-MAG-Pipeline/3-concoct/steim2.coverage_table.tsv -l 50000 -b /pub65/hannaht/HiFi-MAG-Pipeline/3-concoct/steim2 -t 24 2> /pub65/hannaht/HiFi-MAG-Pipeline/logs/steim2.ConcoctAnalysis.log (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Removing output files of failed job ConcoctAnalysis since they might be corrupted: /pub65/hannaht/HiFi-MAG-Pipeline/3-concoct/steim2 Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2023-02-01T161349.860874.snakemake.log
I also get this in the log for each job "reason: Missing output files:" the workflow runs to the next job, but I have the same phrase as above come up. The files are produced in the end, but some of them are empty such as the .index.completed.txt. I have attached the full log for the context of this.
Thanks in advance for your help.
snakmake_error.log