Microbial-Ecology-Group / AMRplusplus

AMR++ is a bioinformatic pipeline meant to aid in the analysis of raw sequencing reads to characterize the profile of antimicrobial resistance genes, or resistome.
https://www.meglab.org/
GNU General Public License v3.0
27 stars 8 forks source link

not working with "real" samples #6

Closed pedres closed 1 year ago

pedres commented 1 year ago

Hi,

I am testing the AMR++ with heavier data but I fails due to an error on trimmomatic. The output of trimmomatic.stats.log is below. It seems that it did not find any clipping sequence and ends. I am running a not paired read file. Another problem is how to specify the pipeline chosen. For example ·nextflow run main_AMR++.nf -profile conda" perfectly runs the demo but if I do "nextflow run main_AMR++.nf -profile conda --pipeline standard_AMR " it gives "Unknown method invocation div on ConfigObject type"

Thanks for your help.

TrimmomaticPE: Started with arguments: -threads 4 SRR4454621_1.fastq null SRR4454621_1.1P.fastq.gz SRR4454621_1.1U.fastq.gz SRR4454621_1.2P.fastq.gz SRR4454621_1.2U.fastq.gz ILLUMINACLIP:/home/fulgencio/AMRplusplus/data/adapters/nextera.fa:2:30:10:3:TRUE LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 Using Long Clipping Sequence: 'GACGCTGCCGACGATCTTACGCGTGTAGATCTCGGTGGTCGCCGTATCATT' Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTCCGAGCCCACGAGAC' Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTCTGAGCGGGC en carpetTGGCAAGGC' Using Long Clipping Sequence: 'CTGATGGCGCGAGGGAGGCGTGTAGATCTCGGTGGTCGCCGTATCATT' Skipping duplicate Clipping Sequence: 'GACGCTGCCGACGATCTTACGCGTGTAGATCTCGGTGGTCGCCGTATCATT' Using Long Clipping Sequence: 'GACGCTGCCGACGAACTCTAGGGTGTAGATCTCGGTGGTCGCCGTATCATT' Using Long Clipping Sequence: 'GACGCTGCCGACGACTTAATAGGTGTAGATCTCGGTGGTCGCCGTATCATT' Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTCTGATGGCGCGAGGGAGGC' Using Long Clipping Sequence: 'CCGAGCCCACGAGACAAGAGGCAATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTGACGCTGCCGACGA' Using Long Clipping Sequence: 'CCGAGCCCACGAGACCGAGGCTGATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'CTGAGCGGGCTGGCAAGGCAGACCGATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'CCGAGCCCACGAGACGTAGAGGAATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'GACGCTGCCGACGACGGAGAGAGTGTAGATCTCGGTGGTCGCCGTATCATT' Using Long Clipping Sequence: 'GACGCTGCCGACGAATTAGACGGTGTAGATCTCGGTGGTCGCCGTATCATT' Using Long Clipping Sequence: 'GACGCTGCCGACGACTAGTCGAGTGTAGATCTCGGTGGTCGCCGTATCATT' Using Long Clipping Sequence: 'GACGCTGCCGACGAAGCTAGAAGTGTAGATCTCGGTGGTCGCCGTATCATT' Using Long Clipping Sequence: 'CCGAGCCCACGAGACAGGCAGAAATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'CCGAGCCCACGAGACCGTACTAGATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'CCGAGCCCACGAGACTAAGGCGAATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'GACGCTGCCGACGATATGCAGTGTGTAGATCTCGGTGGTCGCCGTATCATT' Using Long Clipping Sequence: 'GACGCTGCCGACGACTCCTTACGTGTAGATCTCGGTGGTCGCCGTATCATT' Using Long Clipping Sequence: 'GACGCTGCCGACGAAGGCTTAGGTGTAGATCTCGGTGGTCGCCGTATCATT' Using Long Clipping Sequence: 'GACGCTGCCGACGATACTCCTTGTGTAGATCTCGGTGGTCGCCGTATCATT' Using Long Clipping Sequence: 'CCGAGCCCACGAGACCTCTCTACATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'CCGAGCCCACGAGACTAGGCATGATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'GATCGGAAGAGCACACGTCTGAACTCCAGTCAC' Skipping duplicate Clipping Sequence: 'CCGAGCCCACGAGACCGTACTAGATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'CCGAGCCCACGAGACGGACTCCTATCTCGTATGCCGTCTTCTGCTTG' Skipping duplicate Clipping Sequence: 'CCGAGCCCACGAGACTAAGGCGAATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'CCGAGCCCACGAGACTCCTGAGCATCTCGTATGCCGTCTTCTGCTTG' Skipping duplicate Clipping Sequence: 'CCGAGCCCACGAGACTCCTGAGCATCTCGTATGCCGTCTTCTGCTTG' Skipping duplicate Clipping Sequence: 'CCGAGCCCACGAGACAGGCAGAAATCTCGTATGCCGTCTTCTGCTTG' Skipping duplicate Clipping Sequence: 'CCGAGCCCACGAGACTAGGCATGATCTCGTATGCCGTCTTCTGCTTG' Skipping duplicate Clipping Sequence: 'CCGAGCCCACGAGACGGACTCCTATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'CCGAGCCCACGAGACCAGAGAGGATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'GACGCTGCCGACGAATAGAGAGGTGTAGATCTCGGTGGTCGCCGTATCATT' Skipping duplicate Clipping Sequence: 'CCGAGCCCACGAGACCTCTCTACATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'CCGAGCCCACGAGACGCTACGCTATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'GACGCTGCCGACGAAGAGGATAGTGTAGATCTCGGTGGTCGCCGTATCATT' Using Long Clipping Sequence: 'CCGAGCCCACGAGACGCTCATGAATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTGACGCTGCCGACGAGCGATCTAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAA' Skipping duplicate Clipping Sequence: 'CCGAGCCCACGAGACGTAGAGGAATCTCGTATGCCGTCTTCTGCTTG' Skipping duplicate Clipping Sequence: 'CCGAGCCCACGAGACAAGAGGCAATCTCGTATGCCGTCTTCTGCTTG' Skipping duplicate Clipping Sequence: 'CCGAGCCCACGAGACCGAGGCTGATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'CCGAGCCCACGAGACGCGTAGTAATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'CCGAGCCCACGAGACGGAGCTACATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTCCGAGCCCACGAGACTAAGGCGAATCTCGTATGCCGTCTTCTGCTTGAAAA' Using Long Clipping Sequence: 'CCGAGCCCACGAGACACTCGCTAATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'CCGAGCCCACGAGACATCTCAGGATCTCGTATGCCGTCTTCTGCTTG' Skipping duplicate Clipping Sequence: 'GACGCTGCCGACGATACTCCTTGTGTAGATCTCGGTGGTCGCCGTATCATT' Skipping duplicate Clipping Sequence: 'GACGCTGCCGACGAAGGCTTAGGTGTAGATCTCGGTGGTCGCCGTATCATT' Skipping duplicate Clipping Sequence: 'GACGCTGCCGACGACTCCTTACGTGTAGATCTCGGTGGTCGCCGTATCATT' Skipping duplicate Clipping Sequence: 'GACGCTGCCGACGATATGCAGTGTGTAGATCTCGGTGGTCGCCGTATCATT' Skipping duplicate Clipping Sequence: 'GACGCTGCCGACGAAGAGGATAGTGTAGATCTCGGTGGTCGCCGTATCATT' Using Long Clipping Sequence: 'GACGCTGCCGACGATCTACTCTGTGTAGATCTCGGTGGTCGCCGTATCATT' Using Long Clipping Sequence: 'GACGCTGCCGACGAGCGATCTAGTGTAGATCTCGGTGGTCGCCGTATCATT' Using Long Clipping Sequence: 'GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT' Skipping duplicate Clipping Sequence: 'GACGCTGCCGACGAATAGAGAGGTGTAGATCTCGGTGGTCGCCGTATCATT' Using Long Clipping Sequence: 'CCGAGCCCACGAGACACTGAGCGATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'CCGAGCCCACGAGACTAGCGCTCATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'CCGAGCCCACGAGACATGCGCAGATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'CCGAGCCCACGAGACTACGCTGCATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'CCGAGCCCACGAGACCGGAGCCTATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'CCGAGCCCACGAGACTCGACGTCATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'CCGAGCCCACGAGACTGCAGCTAATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'CCGAGCCCACGAGACCGATCAGTATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'CCGAGCCCACGAGACCCTAAGACATCTCGTATGCCGTCTTCTGCTTG' Using Long Clipping Sequence: 'GACGCTGCCGACGAATAGCCTTGTGTAGATCTCGGTGGTCGCCGTATCATT' Using Long Clipping Sequence: 'GACGCTGCCGACGATCGCATAAGTGTAGATCTCGGTGGTCGCCGTATCATT' Using Long Clipping Sequence: 'GACGCTGCCGACGATAAGGCTCGTGTAGATCTCGGTGGTCGCCGTATCATT' ILLUMINACLIP: Using 0 prefix pairs, 54 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences Exception in thread "main" java.io.FileNotFoundException: null (No existe el archivo o el directorio) at java.base/java.io.FileInputStream.open0(Native Method) at java.base/java.io.FileInputStream.open(FileInputStream.java:216) at java.base/java.io.FileInputStream.(FileInputStream.java:157) at org.usadellab.trimmomatic.fastq.FastqParser.parse(FastqParser.java:135) at org.usadellab.trimmomatic.TrimmomaticPE.process(TrimmomaticPE.java:268) at org.usadellab.trimmomatic.TrimmomaticPE.run(TrimmomaticPE.java:555) at org.usadellab.trimmomatic.Trimmomatic.main(Trimmomatic.java:80)

EorgeKit commented 1 year ago

@pedres I think you need to edit the params.config file to customize it to your needs

pedres commented 1 year ago

Maybe. I think that it is an issue with this sample, because the pipeline runs with test samples. Also if I run the pipeline uysing the "resistome pipeline" it works (nextflow run main_AMR++.nf -profile conda --pipeline resistome --reads "/media/fulgencio/DATOS/args/SRR4454621_1.fastq") One question. kraken classify all sequences or only those mapped to ARGs?

Thank you very much for your help

gaworj commented 1 year ago

Hello,

I have encountered similar issue. Unfortunately in my case even the resistome pipeline does not work:

nextflow run main_AMR++.nf -profile conda --threads 64 --output TOP10_AMR_conda --pipeline resistome --reads "data/raw/TOP10_R{1,2}.fastq.gz" N E X T F L O W ~ version 22.04.4 Launching main_AMR++.nf [cranky_leavitt] DSL2 - revision: 16dbf3f086 A M R + + N F P I P E L I N E

reads : data/raw/TOP10_R{1,2}.fastq.gz output : TOP10_AMR_conda

executor > local (4) [77/b399c5] process > FASTQ_RESISTOME_WF:index [100%] 1 of 1 ✔ [71/9d689a] process > FASTQ_RESISTOME_WF:bwa_align (TOP10_R) [100%] 1 of 1 ✔ [23/2235ee] process > FASTQ_RESISTOME_WF:runresistome (TOP10_R) [ 0%] 0 of 1 [- ] process > FASTQ_RESISTOME_WF:resistomeresults - [74/1a9fa8] process > FASTQ_RESISTOME_WF:runrarefaction (TOP10_R) [ 0%] 0 of 1 [- ] process > FASTQ_RESISTOME_WF:plotrarefaction - Error executing process > 'FASTQ_RESISTOME_WF:runrarefaction (TOP10_R)' executor > local (4) [77/b399c5] process > FASTQ_RESISTOME_WF:index [100%] 1 of 1 ✔ [71/9d689a] process > FASTQ_RESISTOME_WF:bwa_align (TOP10_R) [100%] 1 of 1 ✔ [- ] process > FASTQ_RESISTOME_WF:runresistome (TOP10_R) - [- ] process > FASTQ_RESISTOME_WF:resistomeresults - [74/1a9fa8] process > FASTQ_RESISTOME_WF:runrarefaction (TOP10_R) [100%] 1 of 1, failed: 1 ✘ [- ] process > FASTQ_RESISTOME_WF:plotrarefaction - Error executing process > 'FASTQ_RESISTOME_WF:runrarefaction (TOP10_R)'

Caused by: Process FASTQ_RESISTOME_WF:runrarefaction (TOP10_R) terminated with an error exit status (127)

Command executed:

samtools view -h TOP10_R.alignment.sorted.bam > converted.sam

rarefaction -ref_fp megares_database_v3.00.fasta -sam_fp converted.sam -annot_fp megares_annotations_v3.00.csv -gene_fp TOP10_R.gene.tsv -group_fp TOP10_R.group.tsv -mech_fp TOP10_R.mech.tsv -class_fp TOP10_R.class.tsv -type_fp TOP10_R.type.tsv -min 5 -max 100 -skip 5 -samples 1 -t 80

rm converted.sam

Command exit status: 127

Command output: (empty)

Command error: .command.sh: line 2: samtools: command not found

Work dir: /home/data_storage/soft/AMRplusplus/work/74/1a9fa8ebfe578ea43af8389949a978

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

Any hints?

EnriqueDoster commented 1 year ago

Thank you for reporting this issue.

@pedres, AMR++ will classify all of your sample reads, not just those mapped to ARGs. We've been having issues with the conda installation because AMR++ was trying to create multiple conda environments at once and was getting hung up and creating faulty environments. Now, AMR++ will create a large environment that contains all the software dependencies and we've had fewer issues with that. Could you please try again with a fresh install of AMR++?

Also, to avoid relying on nextflow to manage the conda environment, this method seems to work well:

# After downloading the AMR++ repository, navigate into it and use the following command to create a conda environment.
# If you have mamba installed, change "conda" to "mamba" for quicker installations. 
conda env create -f envs/AMR++_env.yaml
# Now activate the conda environment
conda activate activate AMR++_env

# Now, the tools will be available "locally" so we must run AMR++ using the "local" profile.
nextflow run main_AMR++.nf -profile local
pedres commented 1 year ago

Thanks a lot for your help. I have managed to get it working with the previous version, but I will try to do it with the new version this week.