Vini2 / phables

🫧🧬 From fragmented assemblies to high-quality bacteriophage genomes
https://phables.readthedocs.io/
MIT License
67 stars 7 forks source link

Possible orphaned paired read for long reads? #47

Closed ZarulHanifah closed 3 weeks ago

ZarulHanifah commented 3 months ago

Describe the bug I am trying to use phables on my metagenome assembly, assembled with metaFlye.

To Reproduce Steps to reproduce the behaviour, including the

  1. Command executed
    
    phables run --threads 12 \
    --output /fs03/jm41/Zarul/C002_E1_results/viral_predict/phables \
    --input /fs03/jm41/Zarul/C002_E1_results/flye/assembly_graph.gfa \
    --reads /fs03/jm41/Zarul/C002_E1_results/viral_predict/seqkit --longreads

reads_dir is a directory containing multiple ONT fastqs

2. Error message

[2024:04:03 04:44:18] Copying system default config to /fs03/jm41/Zarul/C002_E1_results/viral_predict/phables/config.yaml [2024:04:03 04:44:18] Updating config file with new values [2024:04:03 04:44:18] Writing config file to /fs03/jm41/Zarul/C002_E1_results/viral_predict/phables/config.yaml [2024:04:03 04:44:18] ------------------ [2024:04:03 04:44:18] | Runtime config | [2024:04:03 04:44:18] ------------------

alpha: 1.2 compcount: 200 conda_prefix: /fs03/ie79/Zarul/status_nanopore/C002E1/.snakemake/conda/9f3dc56d4ec4427ad4b36b73a672b0f6/lib/python3.11/site-packages/phables/workflow/conda configfile: /fs03/jm41/Zarul/C002_E1_results/viral_predict/phables/config.yaml covtol: 100 databases: null evalue: 1.0e-10 input: /fs03/jm41/Zarul/C002_E1_results/flye/assembly_graph.gfa log: /fs03/jm41/Zarul/C002_E1_results/viral_predict/phables/phables.log longreads: true maxpaths: 10 mgfrac: 0.2 mincov: 10 minlength: 2000 output: /fs03/jm41/Zarul/C002_E1_results/viral_predict/phables prefix: null profile: null reads: /fs03/jm41/Zarul/C002_E1_results/viral_predict/seqkit resources: jobCPU: 8 jobMem: 16000 seqidentity: 0.3 snake_args: [] snake_default:

[2024:04:03 04:44:18] --------------------- [2024:04:03 04:44:18] | Snakemake command | [2024:04:03 04:44:18] ---------------------

snakemake -s /fs03/ie79/Zarul/status_nanopore/C002E1/.snakemake/conda/9f3dc56d4ec4427ad4b36b73a672b0f6/lib/python3.11/site-packages/phables/workflow/phables.smk --configfile /fs03/jm41/Zarul/C002_E1_results/viral_predict/phables/config.yaml --cores 12 --use-conda --cond Config file /fs03/ie79/Zarul/status_nanopore/C002E1/.snakemake/conda/9f3dc56d4ec4427ad4b36b73a672b0f6/lib/python3.11/site-packages/phables/workflow/../config/config.yaml is extended by additional config specified via the command line. Config file /fs03/ie79/Zarul/status_nanopore/C002E1/.snakemake/conda/9f3dc56d4ec4427ad4b36b73a672b0f6/lib/python3.11/site-packages/phables/workflow/../config/databases.yaml is extended by additional config specified via the command line. /fs03/ie79/Zarul/status_nanopore/C002E1/.snakemake/conda/9f3dc56d4ec4427ad4b36b73a672b0f6/lib/python3.11/site-packages/metasnek/fastq_finder.py:66: Warning: Possible orphaned paired read detected for PAU29885_skip_fe307bda_418464a3_1.fastq.gz with tag _1. warnings.warn( /fs03/ie79/Zarul/status_nanopore/C002E1/.snakemake/conda/9f3dc56d4ec4427ad4b36b73a672b0f6/lib/python3.11/site-packages/metasnek/fastq_finder.py:66: Warning: Possible orphaned paired read detected for PAU29885_skip_3d6e88b0_49592898_1.fastq.gz with tag _1. warnings.warn( Assuming unrestricted shared filesystem usage. Building DAG of jobs... MissingInputException in rule scan_smg in file /fs03/ie79/Zarul/status_nanopore/C002E1/.snakemake/conda/9f3dc56d4ec4427ad4b36b73a672b0f6/lib/python3.11/site-packages/phables/workflow/rules/genes.smk, line 6: Missing input files for rule scan_smg: output: /fs03/jm41/Zarul/C002_E1_results/viral_predict/phables/preprocess/edges.fasta.hmmout affected files: /fs03/ie79/Zarul/status_nanopore/C002E1/.snakemake/conda/9f3dc56d4ec4427ad4b36b73a672b0f6/lib/python3.11/site-packages/phables/workflow/../../databases/marker.hmm Output files will be saved to directory, /fs03/jm41/Zarul/C002_E1_results/viral_predict/phables

[2024:04:03 04:45:52] ERROR: Snakemake failed



Thank you.
Vini2 commented 3 months ago

Hi @ZarulHanifah,

Thanks for your interest in Phables.

I see that your reads file is named as PAU29885_skip_fe307bda_418464a3_1.fastq.gz. Phables uses Koverage to map reads and the pattern *_1.* expects that there is a paired *_2.* reads file as well. More details here.

If you change the *_1.* to *_S.* in the reads file (or remove _1) name it should technically work. Please let me know if it doesn't work.

Thanks, Vijini

Vini2 commented 3 months ago

Hi @ZarulHanifah,

I hope you were able to run Phables on your long-read data.

I'm closing this issue. Please reopen it if needed.

Thanks!

SwapnilDoijad commented 2 months ago

Hi Vijini,

I have the same error. I tried with _S. (i.e. barcode_S.fastq.gz) but it did not work.


command

phables run  \
--input results/0048_genome_assembly_by_flye/raw_files/barcode01.10/assembly_graph.gfa \
--reads test \
--threads 10 \
--longreads

(phables_v1.3.2_home) [xa73pav@login2 p_natia_phage]$ bash run.sh [2024:05:02 17:12:59] Config file phables.out/config.yaml already exists. Using existing config file. [2024:05:02 17:12:59] Updating config file with new values [2024:05:02 17:12:59] Writing config file to phables.out/config.yaml [2024:05:02 17:12:59] ------------------ [2024:05:02 17:12:59] | Runtime config | [2024:05:02 17:12:59] ------------------

alpha: 1.2 compcount: 200 conda_prefix: /home/groups/VEO/tools/anaconda3/envs/phables_v1.3.2_home/lib/python3.11/site-packages/phables/workflow/conda configfile: phables.out/config.yaml covtol: 100 databases: null evalue: 1.0e-10 input: results/0048_genome_assembly_by_flye/raw_files/barcode01.10/assembly_graph.gfa log: phables.out/phables.log longreads: true maxpaths: 10 mgfrac: 0.2 mincov: 10 minlength: 2000 output: phables.out prefix: null profile: null reads: test/ resources: jobCPU: 8 jobMem: 16000 seqidentity: 0.3 snake_args: [] snake_default:

[2024:05:02 17:12:59] --------------------- [2024:05:02 17:12:59] | Snakemake command | [2024:05:02 17:12:59] ---------------------

snakemake -s /home/groups/VEO/tools/anaconda3/envs/phables_v1.3.2_home/lib/python3.11/site-packages/phables/workflow/phables.smk --configfile phables.out/config.yaml --cores 10 --use-conda --conda-prefix /home/groups/VEO/tools/anaconda3/envs/phables_v1.3.2_home/lib/python3.11/site-packages/phables/workflow/conda --rerun-incomplete --printshellcmds --nolock --show-failed-logs Set parameter Username Academic license - for non-commercial use only - expires 2025-05-02 Config file /home/groups/VEO/tools/anaconda3/envs/phables_v1.3.2_home/lib/python3.11/site-packages/phables/workflow/../config/config.yaml is extended by additional config specified via the command line. Config file /home/groups/VEO/tools/anaconda3/envs/phables_v1.3.2_home/lib/python3.11/site-packages/phables/workflow/../config/databases.yaml is extended by additional config specified via the command line. Output files will be saved to directory, phables.out

Assuming unrestricted shared filesystem usage. Building DAG of jobs... MissingInputException in rule scan_smg in file /home/groups/VEO/tools/anaconda3/envs/phables_v1.3.2_home/lib/python3.11/site-packages/phables/workflow/rules/genes.smk, line 6: Missing input files for rule scan_smg: output: phables.out/preprocess/edges.fasta.hmmout affected files: /home/groups/VEO/tools/anaconda3/envs/phables_v1.3.2_home/lib/python3.11/site-packages/phables/workflow/../../databases/marker.hmm


Vini2 commented 2 months ago

Hi @SwapnilDoijad,

If you are running Phables using short reads, the input expects paired-end reads (not single-end reads), because paired-end reads mappings are required for the genome resolution.

Phables has a --longreads flag that can handle single-end long reads. If you want to run *_S reads it might be worth a try using the --longreads flag (I haven't tested it though).

Please let me know if it works.

SwapnilDoijad commented 2 months ago

I did use --longreads flag as mentioned on the GitHub page. Isn't this correct?

phables run \ --input results/0048_genome_assembly_by_flye/raw_files/barcode01.10/assembly_graph.gfa \ --reads test \ --threads 10 \ --longreads

Vini2 commented 2 months ago

Hi @SwapnilDoijad

The command is correct. I'll have a look at the data you emailed me and get back to you soon.

Vini2 commented 3 weeks ago

Hi @SwapnilDoijad

Sorry about the delay in getting back to you.

Can you please try the latest version of Phables (v1.3.3)? Please make sure to install the databases before running Phables. There was an issue with the database download paths that affected the marker.hmm file. It is now resolved. I tried the new version with your data and it worked fine.

Please let me know if there are still issues.

Vini2 commented 3 weeks ago

Closing this issue for now. Please re-open if needed.