Microbial-Ecology-Group / AMRplusplus

AMR++ is a bioinformatic pipeline meant to aid in the analysis of raw sequencing reads to characterize the profile of antimicrobial resistance genes, or resistome.
https://www.meglab.org/
GNU General Public License v3.0
28 stars 12 forks source link

Cannot find reads #16

Closed felipemachado85 closed 1 year ago

felipemachado85 commented 1 year ago

Hi,

I've managed to install the AMR++ env via conda, following the instructions. It works when I run the demo pipeline, but when I try to run my samples (in different pipelines - fast_AMR, resistome, standard_AMR), I get this message:

(AMR++_env) [fsantann@vacc-user1 AMRplusplus]$ nextflow run main_AMR++.nf -profile local --pipeline fast_AMR --reads "gpfs1/home/f/s/fsantann/kneaddataoutput/C4MP{1,2}.fastq"

N E X T F L O W ~ version 22.04.3 Launching main_AMR++.nf [soggy_cantor] DSL2 - revision: 77a1d0d91c A M R + + N F P I P E L I N E

reads : gpfs1/home/f/s/fsantann/kneaddataoutput/C4MP{1,2}.fastq output : test_results

[- ] process > FAST_AMRplusplus:FASTQ_QC_WF:fastqc - [- ] process > FAST_AMRplusplus:FASTQ_QC_WF:multiqc - [- ] process > FAST_AMRplusplus:FASTQ_TRIM_WF:runqc - [- ] process > FAST_AMRplusplus:FASTQ_TRIM_WF:QCstats - [- ] process > FAST_AMRplusplus:FASTQ_RESISTOME_WF:index - [- ] process > FAST_AMRplusplus:FASTQ_QC_WF:fastqc - [- ] process > FAST_AMRplusplus:FASTQ_QC_WF:multiqc - [- ] process > FAST_AMRplusplus:FASTQ_TRIM_WF:runqc - [- ] process > FAST_AMRplusplus:FASTQ_TRIM_WF:QCstats - [- ] process > FAST_AMRplusplus:FASTQ_RESISTOME_WF:index [ 0%] 0 of 1 [- ] process > FAST_AMRplusplus:FASTQ_RESISTOME_WF:bwa_align - [- ] process > FAST_AMRplusplus:FASTQ_RESISTOME_WF:runresistome - [- ] process > FAST_AMRplusplus:FASTQ_RESISTOME_WF:resistomeresults - [- ] process > FAST_AMRplusplus:FASTQ_RESISTOME_WF:runrarefaction - [- ] process > FAST_AMRplusplus:FASTQ_RESISTOME_WF:plotrarefaction - Cannot find any reads matching: gpfs1/home/f/s/fsantann/kneaddataoutput/C4MP{1,2}.fastq

-- Check script 'main_AMR++.nf' at line: 21 or see '.nextflow.log' file for more details

Does anybody know how to solve this problem?

Thank you!

Felipe

felipemachado85 commented 1 year ago

I've also tried with other versions of Nextflow (22.10.4) and I'm still getting the same error.

EnriqueDoster commented 1 year ago

Hi @felipemachado85, can you show us what your samples look like? My first guess is that the regular expression that you used doesn't quite match the pattern for your sample names. So, what does it look like if you do this? ls gpfs1/home/f/s/fsantann/kneaddata_output/

Another trick is that you should be able to use ls with whatever pattern you're using for the --reads`` flag and it should work. Like this: ls "gpfs1/home/f/s/fsantann/kneaddata_output/C4MP{1,2}.fastq"`

If there's an error, then your pattern needs to be modified.

Thanks!

felipemachado85 commented 1 year ago

Hello Enrique!

Thank you for your reply. I've managed to get it working. However, I'm getting another error, which is probably related to the kneaddata output. The kneaddata is a pipeline for QC and host removal (since I have samples from other mammalian species), but somehow the host removed samples end up with different contents. [fsantann@vacc-user1 AMRplusplus]$ nextflow run main_AMR++.nf -profile local --pipeline fast_AMR --reads "AMRcheese/CCM1/CCM1{1,2}.fastq" N E X T F L O W ~ version 22.10.4 Launching main_AMR++.nf [curious_joliot] DSL2 - revision: 77a1d0d91c A M R + + N F P I P E L I N E

reads : AMRcheese/CCM1/CCM1{1,2}.fastq output : test_results

executor > local (6) [aa/dd47d6] process > FAST_AMRplusplus:FASTQ_QC_WF:fastqc (FASTQC on CCM1) [100%] 1 of 1 ✔ [c4/a3e575] process > FAST_AMRplusplus:FASTQ_QC_WF:multiqc [100%] 1 of 1 ✔ [54/be7631] process > FAST_AMRplusplus:FASTQ_TRIM_WF:runqc (CCM1) [100%] 1 of 1 ✔ [b7/f889a5] process > FAST_AMRplusplus:FASTQ_TRIM_WF:QCstats (null) [100%] 1 of 1 ✔ executor > local (6) [aa/dd47d6] process > FAST_AMRplusplus:FASTQ_QC_WF:fastqc (FASTQC on CCM1) [100%] 1 of 1 ✔ [c4/a3e575] process > FAST_AMRplusplus:FASTQ_QC_WF:multiqc [100%] 1 of 1 ✔ [54/be7631] process > FAST_AMRplusplus:FASTQ_TRIM_WF:runqc (CCM1) [100%] 1 of 1 ✔ [b7/f889a5] process > FAST_AMRplusplus:FASTQ_TRIM_WF:QCstats (null) [100%] 1 of 1 ✔ [71/61aa46] process > FAST_AMRplusplus:FASTQ_RESISTOME_WF:index [100%] 1 of 1 ✔ [93/57c213] process > FAST_AMRplusplus:FASTQ_RESISTOME_WF:bwa_align (CCM1) [100%] 1 of 1, failed: 1 ✘ [- ] process > FAST_AMRplusplus:FASTQ_RESISTOME_WF:runresistome - [- ] process > FAST_AMRplusplus:FASTQ_RESISTOME_WF:resistomeresults - [- ] process > FAST_AMRplusplus:FASTQ_RESISTOME_WF:runrarefaction - [- ] process > FAST_AMRplusplus:FASTQ_RESISTOME_WF:plotrarefaction - Error executing process > 'FAST_AMRplusplus:FASTQ_RESISTOME_WF:bwa_align (CCM1)'

Caused by: Process FAST_AMRplusplus:FASTQ_RESISTOME_WF:bwa_align (CCM1) terminated with an error exit status (1)

Command executed:

bwa mem megares_database_v3.00.fasta CCM1.1P.fastq.gz CCM1.2P.fastq.gz -t 4 -R '@RG\tID:CCM1\tSM:CCM1' > CCM1_alignment.sam samtools view -@ 4 -S -b CCM1_alignment.sam > CCM1_alignment.bam rm CCM1_alignment.sam samtools sort -@ 4 -n CCM1_alignment.bam -o CCM1_alignment_sorted.bam rm CCM1_alignment.bam

Command exit status: 1

Command output: (empty)

Command error: [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::process] read 202916 sequences (40000216 bp)... [M::process] read 203960 sequences (39500503 bp)... [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (1, 3, 0, 0) [M::mem_pestat] skip orientation FF as there are not enough pairs [M::mem_pestat] skip orientation FR as there are not enough pairs [M::mem_pestat] skip orientation RF as there are not enough pairs [M::mem_pestat] skip orientation RR as there are not enough pairs [mem_sam_pe] paired reads have different names: "M03644:73:000000000-KPRWL:1:1101:10000:24403.1:N:0:57#0", "M03644:73:000000000-KPRWL:1:1101:10000:24403.2:N:0:57#0"

[mem_sam_pe] paired reads have different names: "M03644:73:000000000-KPRWL:1:1101:10009:16971.1:N:0:57#0", "M03644:73:000000000-KPRWL:1:1101:10008:11622.2:N:0:57#0"

[mem_sam_pe]

Work dir: /gpfs1/home/f/s/fsantann/AMRplusplus/work/93/57c2135be83a1fe5dd78f8c6c96745

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

Is there a way to include other host genomes in the AMRplusplus pipeline, maybe it will work.

Thanks!

Felipe

EnriqueDoster commented 1 year ago

Hi @felipemachado85 ,since you didn't use the part of the workflow that removes host sequences from your samples, I would check what the reads look like coming out of kneaddata (e.g. seqtools). Depending on what your host is, you can just use the --host "/path/to/host.fa flag in your command or edit the param.config file. Try that out and let me know how it goes.

I had to make a quick update so go ahead and pull the latest code. Thanks!

felipemachado85 commented 1 year ago

Hi Enrique,

It's working like a charm now. I've included my host FNA in the path and it seems to be running great. The only thing I'm still not getting any luck is with the --snp Y flag:

(AMR++_env) [fsantann@vacc-user1 AMRplusplus]$ nextflow run mainAMR++.nf -profile local --host "fsantann/cow.fna" "fsantann/goat.fna" --snp Y --reads "AMR/CRM2{1,2}.fastq" --output output/test N E X T F L O W ~ version 22.10.4 Launching main_AMR++.nf [clever_newton] DSL2 - revision: 77a1d0d91c A M R + + N F P I P E L I N E

reads : AMR/CRM2_{1,2}.fastq output : output/test

    Running a demonstration of AMR++
    ===================================
    To include SNP analysis, add `--snp Y` to your command.
    ===================================
    To include deduplicated count analysis, add `--deduped Y` to your command.
    Please be aware that adding deduplicated counts will significantly increase run time and temp file storage requirements.
    ===================================

executor > local (11) [02/2920eb] process > FAST_AMRplusplus:FASTQ_QC_WF:fastqc (FASTQC on CRM2) [100%] 1 of 1 ✔ [45/d758a1] process > FAST_AMRplusplus:FASTQ_QC_WF:multiqc [100%] 1 of 1 ✔ [a9/654b99] process > FAST_AMRplusplus:FASTQ_TRIM_WF:runqc (CRM2) [100%] 1 of 1 ✔ [33/c6689b] process > FAST_AMRplusplus:FASTQ_TRIM_WF:QCstats (null) [100%] 1 of 1 ✔ [e5/d802fe] process > FAST_AMRplusplus:FASTQ_RESISTOME_WF:index [100%] 1 of 1 ✔ [bb/49fbfc] process > FAST_AMRplusplus:FASTQ_RESISTOME_WF:bwa_align (CRM2) [100%] 1 of 1 ✔ [05/0e1cfd] process > FAST_AMRplusplus:FASTQ_RESISTOME_WF:runresistome (CRM2) [100%] 1 of 1 ✔ [64/36e4bd] process > FAST_AMRplusplus:FASTQ_RESISTOME_WF:resistomeresults (null) [100%] 1 of 1 ✔ [bf/ba55b6] process > FAST_AMRplusplus:FASTQ_RESISTOME_WF:runrarefaction (CRM2) [100%] 1 of 1 ✔ [1e/ad16b0] process > FAST_AMRplusplus:FASTQ_RESISTOME_WF:plotrarefaction (null) [100%] 1 of 1 ✔ [fc/5eb89d] process > FAST_AMRplusplus:FASTQ_RESISTOME_WF:runsnp (CRM2) [100%] 1 of 1, failed: 1 ✔ [- ] process > FAST_AMRplusplus:FASTQ_RESISTOME_WF:snpresults - Pipeline completed! Started at 2023-04-11T10:04:32.446160-04:00 Finished at 2023-04-11T10:05:39.481733-04:00 Time elapsed: 1m 7s Execution status: OK [fc/5eb89d] NOTE: Process FAST_AMRplusplus:FASTQ_RESISTOME_WF:runsnp (CRM2) terminated with an error exit status (1) -- Error is ignored Completed at: 11-Apr-2023 10:05:39 Duration : 1m 7s CPU hours : (a few seconds) Succeeded : 10 Ignored : 1 Failed : 1

Any thoughts about this?

Thank you so much!

Felipe

EnriqueDoster commented 1 year ago

We're getting closer!

It can get a bit messy but look in the .nextflow.log for the runsnp process and find the working directory for that process. Take a look around and let me know what the AMR_analytic_matrix.csv file looks like, and the other file with a similar name. Also, what do the files in the "Results" folder look like?

Lastly, does the --snp Y flag work with the demo reads?

felipemachado85 commented 1 year ago

Hi Enrique,

Thanks for the quick reply! The demo doesn't seem to work either, it flags the same message just as the real samples. From the nextflow.log file, there's this part related to the SNP

Apr-11 13:32:50.456 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run Apr-11 13:32:50.457 [Task submitter] INFO nextflow.Session - [8a/9edc7f] Submitted process > FAST_AMRplusplus:FASTQ_RESISTOME_WF:runsnp (S3_test) Apr-11 13:32:50.461 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run Apr-11 13:32:50.461 [Task submitter] INFO nextflow.Session - [a1/b9dfa0] Submitted process > FAST_AMRplusplus:FASTQ_RESISTOME_WF:runsnp (S2_test) Apr-11 13:32:50.472 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run Apr-11 13:32:50.472 [Task submitter] INFO nextflow.Session - [28/7c9543] Submitted process > FAST_AMRplusplus:FASTQ_RESISTOME_WF:runsnp (S1_test) Apr-11 13:32:51.338 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 20; name: FAST_AMRplusplus:FASTQ_RESISTOME_WF:runsnp (S2_test); status: COMPLETED; exit: 1; error: -; workDir: /gpfs1/home/f/s/fsantann/AMRplusplus/work/a1/b9dfa0a48da9793f07e7e00566d68c] Apr-11 13:32:51.345 [Task monitor] INFO nextflow.processor.TaskProcessor - [a1/b9dfa0] NOTE: Process FAST_AMRplusplus:FASTQ_RESISTOME_WF:runsnp (S2_test) terminated with an error exit status (1) -- Error is ignored Apr-11 13:32:51.349 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 22; name: FAST_AMRplusplus:FASTQ_RESISTOME_WF:runsnp (S1_test); status: COMPLETED; exit: 1; error: -; workDir: /gpfs1/home/f/s/fsantann/AMRplusplus/work/28/7c95439f0584dbcfa2f94fb4e6c4b1] Apr-11 13:32:51.350 [Task monitor] INFO nextflow.processor.TaskProcessor - [28/7c9543] NOTE: Process FAST_AMRplusplus:FASTQ_RESISTOME_WF:runsnp (S1_test) terminated with an error exit status (1) -- Error is ignored Apr-11 13:32:51.413 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 21; name: FAST_AMRplusplus:FASTQ_RESISTOME_WF:runsnp (S3_test); status: COMPLETED; exit: 1; error: -; workDir: /gpfs1/home/f/s/fsantann/AMRplusplus/work/8a/9edc7f80bae5e62665e3af7349242e] Apr-11 13:32:51.414 [Task monitor] INFO nextflow.processor.TaskProcessor - [8a/9edc7f] NOTE: Process FAST_AMRplusplus:FASTQ_RESISTOME_WF:runsnp (S3_test) terminated with an error exit status (1) -- Error is ignored

The output shows the following folders: QC_analysis ; QC_trimming ; Alignment ; ResistomeAnalysis and Results. In the Results folder there's only one AMR_analytic_matrix.csv file and with every gene it gives the RequiresSNPconfirmation still.

Thanks!

Felipe

EnriqueDoster commented 1 year ago

So, what's in this directory: "/gpfs1/home/f/s/fsantann/AMRplusplus/work/28/7c95439f0584dbcfa2f94fb4e6c4b1"

It's where the S1_test sample was being run with the runsnp process. We should get a better idea of what's going on by looking at those files.

I just tried the demo run again and the snp confirmation seems to work for me. Can you confirm that you got the latest code from AMR++? In case there's an issue with updating certain files, it might be easiest to just erase the whole directory and download a fresh version.

Try that again and let me know what happens.

EnriqueDoster commented 1 year ago

Another thought, if you made the conda environment a while back you might run into dependencies issues with the SNP software. New to this update, it now needs pysam and biopython which weren't in the previous conda environment file.

If you check the .command.log file in the temporary working directory, you might see that it says you are missing those modules. If so, install them to your conda environment and try again. less /gpfs1/home/f/s/fsantann/AMRplusplus/work/28/7c95439f0584dbcfa2f94fb4e6c4b1/.command.sh

felipemachado85 commented 1 year ago

Hello Enrique,

Everything seems to be working now!

I've cloned a new fresh version from the repository and reinstalled the environment.

Thank you so much for your support!

Felipe

EnriqueDoster commented 1 year ago

Awesome, glad to hear it!

Thanks for your help in improving AMR++!