chrisjackson-pellicle / hybpiper-nf

Nextflow and Singularity/Conda pipeline for running HybPiper (https://github.com/mossmatters/HybPiper)
GNU General Public License v3.0
6 stars 2 forks source link

PARALOG_RETRIEVER FileNotFoundError: [Errno 2] No such file or directory: #10

Closed hillap closed 11 months ago

hillap commented 1 year ago

Hi, It seems that for a handful of assemblies the putative_chimeric_stitched_contigs.csv is missing while for most it exists after the assembly step. $ ls work/93/6a9e66eda6d50c1c55f9adf0ce9484/ | grep clean | wc -l $ 202 $ ls work/93/6a9e66eda6d50c1c55f9adf0ce9484// | grep putative | wc -l $ 191

Which leads hybpiper-nf to interrupt with the following error:

Error executing process > 'assemble:assemble_main:PARALOG_RETRIEVER'

Caused by: Process assemble:assemble_main:PARALOG_RETRIEVER terminated with an error exit status (1)

Command executed:

hybpiper paralog_retriever namelist.txt -t_dna targetseqs_fixed.fasta

Command exit status: 1

Command output: (empty)

Command error:

                                                   T
                                                      T
                                       C  G

____ T G A | | | | | | | \ A A A | || | | | | || | | | \ \ / / | \ | __/ | | | \ | | | \ | | | | \ \/ / | || | | | | | | || | | / | | -- || || \ / |__/ || || | / |____| || / / | | // ||

[INFO]: HybPiper was called with these arguments: paralog_retriever namelist.txt -t_dna targetseqs_fixed.fasta

[INFO]: Recovering paralog sequences... [INFO]: Creating directory: paralogs_all [INFO]: Creating directory: paralogs_no_chimeras [INFO]: Searching for paralogs for 202 samples, 20 genes... Elapsed Time: 0:00:00| |ETA: --:--:-- No chimeric stitched contig summary file found for sample sample11_clean! Traceback (most recent call last): File "/home/hillap/miniconda3/envs/hybpiper/bin/hybpiper", line 10, in sys.exit(main()) ^^^^^^ File "/home/hillap/miniconda3/envs/hybpiper/lib/python3.11/site-packages/hybpiper/assemble.py", line 1806, in main args.func(args) File "/home/hillap/miniconda3/envs/hybpiper/lib/python3.11/site-packages/hybpiper/assemble.py", line 1593, in paralog_retriever_main paralog_retriever.main(args) File "/home/hillap/miniconda3/envs/hybpiper/lib/python3.11/site-packages/hybpiper/paralog_retriever.py", line 371, in main num_seqs, has_paralogs, seqs_to_write_all, seqs_to_write_no_chimeras = retrieve_gene_paralogs_from_sample( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hillap/miniconda3/envs/hybpiper/lib/python3.11/site-packages/hybpiper/paralog_retriever.py", line 72, in retrieve_gene_paralogs_from_sample chimeric_genes_to_skip = get_chimeric_genes_for_sample(sample_directory_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hillap/miniconda3/envs/hybpiper/lib/python3.11/site-packages/hybpiper/retrieve_sequences.py", line 48, in get_chimeric_genes_for_sample with open(f'{sample_directory_name}/' ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ FileNotFoundError: [Errno 2] No such file or directory: 'sample11_clean/sample11_clean_genes_derived_from_putative_chimeric_stitched_contig.csv' Elapsed Time: 0:00:00|#######################################################################|Time: 0:00:00

Work dir: /home/hillap/bioinformatics/work/93/6a9e66eda6d50c1c55f9adf0ce9484

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

chrisjackson-pellicle commented 1 year ago

Hi @hillap,

The absence of a putative_chimeric_stitched_contig.csv file in some gene directories is not a problem - these files are only written by HybPiper if such a sequence is detected for that gene.

The issue causing your error is the absence of a summary file called sample11_clean_genes_derived_from_putative_chimeric_stitched_contig.csv in your sample11_clean directory. This file should have been written by the hybpiper assemble step; even if none of your genes had a putative_chimeric_stitched_contig.csv file, the summary file should just be empty. To me this suggests that the hybpiper assemble step failed to complete for some samples.

Are you able to locate the Nextflow work directory that contains the output from the hybpiper assemble run for sample11_clean, and upload the sample11_clean_hybpiper_assemble_<date_time>.log file from the sample11_clean directory? Next update I'll make sure these sample logs are all saved as part of the Nextflow pipeline output. If you could also upload the .command.err file from that work directory (note that it's a hidden file), that would be helpful.

Also, from the look of your error output above, it looks like you're running hybpiper-nf using a standalone conda install of HybPiper at /home/hillap/miniconda3/envs/hybpiper, perhaps using -profile standard. Is that correct? If so, can you check you're using HybPiper version 2.1.2? If your issue turns out to be just some samples not running to completion at the hybpiper assemble stage, it might just be a matter of increasing resources in the profile standard in the config file - see https://github.com/chrisjackson-pellicle/hybpiper-nf/wiki/Additional-pipeline-features-and-details#managing-computing-resources.

Cheers,

Chris

chrisjackson-pellicle commented 1 year ago

Hi @hillap,

I don't know if this is still relevant to you, but I've recently has someone else run in to this error, and in that case it was caused by empty (or very small) read file pairs for some of the samples used as input to the pipeline. This is an issue with the hybpiper-nf pipeline that I'll fix ASAP.

Cheers,

Chris

chrisjackson-pellicle commented 11 months ago

Closing due to inactivity - feel free to re-open if it's still an issue!