Error when running with own reads

mdhfz89 commented 1 year ago

Hi,

I'm facing some issue when I'm trying to run AMR++ with my own reads. These reads were downloaded from SRA as I am trying to make an analysis pipeline which incorporates AMR++ and am using downloaded reads from SRA for my tests.

I installed AMR++ as per instructions on the github and it works well with the demo but exits with an error otherwise like so:

My command:

nextflow run /home/neaehi/tools/AMRplusplus/main_AMR++.nf -profile conda --threads 10 --pipeline standard_AMR_wKraken --kraken_db "/home/neaehi/tools/00_databases/kraken/" --reads "/home/neaehi/WBE/metagTest2/04_amr_mb/*_R{1,2}.fastq.gz"

The output:

N E X T F L O W  ~  version 22.10.6
Launching `/home/neaehi/tools/AMRplusplus/main_AMR++.nf` [elated_avogadro] DSL2 - revision: 77a1d0d91c
 A M R + +    N F   P I P E L I N E
 ===================================
 reads        : /home/neaehi/WBE/metagTest2/04_amr_mb/*_R{1,2}.fastq.gz
 output       : test_results

WARN: Access to undefined parameter `reference` -- Initialise it to a default value eg. `params.reference = some_value`
A process input channel evaluates to null -- Invalid declaration `path fasta`

 -- Check script '/home/neaehi/tools/AMRplusplus/./subworkflows/fastq_host_removal.nf' at line: 15 or see '.nextflow.log' file for more details
Cannot find any reads matching: /home/neaehi/WBE/metagTest2/04_amr_mb/*_R{1,2}.fastq.gz

 -- Check script '/home/neaehi/tools/AMRplusplus/main_AMR++.nf' at line: 21 or see '.nextflow.log' file for more details

I saw you recommending to check the regex in another issue post and so i did that too and the regex works.

My command:

ls /home/neaehi/WBE/metagTest2/04_amr_mb/*_R{1,2}.fastq.gz

Output:

/home/neaehi/WBE/metagTest2/04_amr_mb/SRR16214424_R1.fastq.gz  /home/neaehi/WBE/metagTest2/04_amr_mb/SRR16214430_R1.fastq.gz  /home/neaehi/WBE/metagTest2/04_amr_mb/SRR16214435_R1.fastq.gz
/home/neaehi/WBE/metagTest2/04_amr_mb/SRR16214424_R2.fastq.gz  /home/neaehi/WBE/metagTest2/04_amr_mb/SRR16214430_R2.fastq.gz  /home/neaehi/WBE/metagTest2/04_amr_mb/SRR16214435_R2.fastq.gz

I also attached the nextflow log if that would help: nextflowLog.txt

I'm just wondering if I'm invoking this wrongly. The way i did it was to make the main_AMR++.nf an executable so that I can run AMR++ from within my analysis folders and not within the AMR++ cloned git folder itself. Is that okay or must it be executed within the cloned git directory?

mdhfz89 commented 1 year ago

I also have to mention that like the other user with the issue, I am using kneaddata processed reads as I have to remove host reads from my sample. The rough outline of my pipeline is: 1) Use kneaddata to decontaminate reads (remove human) to get microbiome reads. 2) Get reads that match organism A from microbiome reads to analyse 3) The microbiome reads from (1) will be put through AMR++ for resistome and microbiome analysis

EnriqueDoster commented 1 year ago

Interesting, and thanks for providing these updates.

So, I just pushed an update that fixes the first error you reported. I missed updating a variable name in the "main_AMR++.nf" script so that was an easy fix. Just pull the latest github code from AMR++ and try again.

Running AMR++ from a different location should not be an issue, but please let me know if you run into something because it should be an easy fix. I did a bit of testing just now and it seemed to work OK.

Question, why do you need to use kneaddata? There's a possibility that the output reads from kneaddata are not sorted properly and could be causing the issue that the other user is facing. If you just want to remove the host DNA from your data, I recommend finding a good reference genome (or set of genomes) and including this in your command --host "/path/to/host.fa".

You might have come across this already, but I recommend using KrakenTools and their extract_kraken_reads.py script to pull out the reads that match organism A.

Let me know how your test goes. Thank you!

mdhfz89 commented 1 year ago

Hi, I finally managed to get this tested and it works! I used the kneaddata processed reads and didn't face any issues and managed to get the output from AMR++.

To answer your question, the reason why i did not use the --host "/path/to/host.fa" option was because I did not know about it. I will give it a try and see how it goes. Also, thanks for the heads up about krakentools. I will try this too.

A question regarding using AMR++, if i do not want to run the full pipeline using --pipeline standard_AMR_wKraken, can I use the subworkflows like this --pipeline resistome --pipeline kraken in place of standard_AMR_wKraken?

mdhfz89 commented 1 year ago

I was looking through the results from the Kraken annotation and I realised that despite having batch running 3 different pairs of reads, only 1 of the 3 was actually put through the Kraken annotation. This is the same whether I use --pipeline standard_AMR_wKraken or standalone --pipeline kraken. Any idea why this could be?

I don't think there is an issue with my regex as the resistome analysis has 3 results columns corresponding to the 3 samples I ran using AMR++.

EnriqueDoster commented 1 year ago

Hello again @mdhfz89!

I'm glad you got it working and yep instead of running the entire pipeline, you can use the "subworkflows" as seen on the main README.md file.

It's taken me a while to respond to this so hopefully you were able to get something figured out, but if you're still troubleshooting the kraken run on these three samples the most useful thing would be for you to include the contents of the ".nextflow.log" file that will be in your AMR++ working directory. We should be able to track down the process that was running the other samples without results and then we can investigate what happened by navigating to the temporary working directory.

Thanks!

EnriqueDoster commented 9 months ago

Closing this due to inactivity. Please let us know if you run into any other issues with AMR++. Thanks!

Microbial-Ecology-Group / AMRplusplus

Error when running with own reads #18