arpcard / rgi

Resistance Gene Identifier (RGI). Software to predict resistomes from protein or nucleotide data, including metagenomics data, based on homology and SNP models.
Other
314 stars 75 forks source link

[BUG] [W::sam_parse1] mapped query cannot have zero coordinate; treated as unmapped #256

Closed JennKnapp closed 4 months ago

JennKnapp commented 8 months ago

I am running into the exact same issue as previously posted and closed without comment, is there a fix for this, or any troubleshooting suggestions? command: rgi bwt --local --read_one R1.fastq.gz --read_two R2.fastq.gz --output_file ~/card_output/sample_name

Many many lines of: [W::sam_parse1] mapped query cannot have zero coordinate; treated as unmapped

Then it eventually shows: "merging from 0 files and 16 in-memory blocks" and then many lines of: "WARNING :model with id : 128, has few mapped reads to make consensus sequence skipping : ARO:ID"

the process is then terminated.

raphenya commented 8 months ago

@JennKnapp Let me look for the issue that was closed, at a glance this looks like there are not enough reads to create a consensus sequence by KMA before calling SNPs.

raphenya commented 8 months ago

Connecting #219, @JennKnapp are you able to share the reads with me so that I can debug? Cheers.

JennKnapp commented 8 months ago

Sure, here is a link to a set of the fastq.gz files I was using when I ran into this issue: https://drive.google.com/drive/folders/1LS8cCJMr08nzNGWevZdteio5zcgC1B--?usp=sharing

raphenya commented 8 months ago

@JennKnapp running fastqc on the two samples shows adapter content. Maybe trim before running rgi bwt? Some of the sequences also have some "N"s. Feel free to share the trimmed reads and I will test again. Cheers.

JennKnapp commented 8 months ago

@raphenya I've uploaded the trimmed reads to the same drive folder, I removed adapters and low quality bases. The same error messages appear when running rgi bwt on these cleaned up reads though, so hopefully you'll have better luck. Thanks!

raphenya commented 7 months ago

@JennKnapp ok, cool. Thanks. I will take a look. Cheers.

raphenya commented 7 months ago

@JennKnapp ok, I did the following:

fastqc for both untrimmed and trimmed

There are still lots of k-mers, but that's to be expected in this type of data.

ran RGIbwt for both untrimmed and trimmed, using KMA as aligner

ran RGIbwt for both untrimmed and trimmed, using Bowtie2 as aligner

I checked the headers for the sam files for both KMA and Bowtie2

These are the only differences:

diff kma_headers.txt bowtie2_headers.txt 
1,2c1
< @HD   VN:1.6  GO:reference
< @PG   ID:KMA  PN:kma  VN:1.4.9    CL:kma -mem_mode -ex_mode -1t1 -vcf -ipe "CB-Shotgun_S119_R1_001.fastq.gz" "CB-Shotgun_S119_R2_001.fastq.gz" -t 20 -t_db /workspace/lab/mcarthurlab/raphenar/issue256/localDB/bwt/card_reference/kma -o "/workspace/lab/mcarthurlab/raphenar/issue256/output_kma.temp.sam.temp" -sam 
---
> @HD   VN:1.5  SO:unsorted GO:query
4806a4806
> @PG   ID:bowtie2  PN:bowtie2  VN:2.5.1    CL:"/var/miniconda3/envs/rgi603/bin/bowtie2-align-s --wrapper basic-0 --quiet --very-sensitive-local --threads 20 -x /workspace/lab/mcarthurlab/raphenar/issue256/localDB/bwt/card_reference/bowtie2 -S /workspace/lab/mcarthurlab/raphenar/issue256/output.temp.sam -1 CB-Shotgun_S119_R1_001.fastq.gz -2 CB-Shotgun_S119_R2_001.fastq.gz"

Other warnings

e.g WARNING 2023-12-08 20:05:03,653 : model with id : 33, has few mapped reads to make consensus sequence skipping: 'ARO:3003109|ID:33|Name:msrE|NCBI:EU294228.1'

These happened only when using KMA aligner as we try to use the reads to create a consensus sequence.

Amos' conclusion

It's not obvious what's causing the [W::sam_parse1] mapped query cannot have zero coordinate; treated as unmapped as both bowtie2 and kma have same number of sequences added to their headers. I will do more testing with reads that are not expected to map and that will map and see if I get the same warnings.

JennKnapp commented 7 months ago

Thanks a bunch for looking into this! I was able to RGIbwt for a few other similar datasets and although I got the same errors the runs were completed and produced results. I would rather stick with the kma aligner over bowtie2 as it's better for metagenomic data.

I am also trying different pre-processing steps (trimming, filtering, removing duplicate reads, etc.), so if any of these result in the error messages going away I will update this thread too.

raphenya commented 7 months ago

@JennKnapp Ok, I think I might have found why we are getting the [W::sam_parse1] mapped query cannot have zero coordinate; treated as unmapped. When aligning with KMA the RNAME is not set to the reference name just the *. My next test will pull only 4 reads (2 with RNAME and 2 without) and I will also post an issue with both samtools and KMA developers. Cheers.

raphenya commented 6 months ago

@JennKnapp see https://bitbucket.org/genomicepidemiology/kma/issues/86/w-sam_parse1-mapped-query-cannot-have-zero

github-actions[bot] commented 4 months ago

Issue is stale and will be closed in 7 days unless there is new activity

DrYoungOG commented 1 month ago

@JennKnapp see https://bitbucket.org/genomicepidemiology/kma/issues/86/w-sam_parse1-mapped-query-cannot-have-zero

This link is unavailable: This issue is submitted and being reviewed.

jihen-lau commented 3 weeks ago

Hi @raphenya , is there any updates or fix on this? Cause I'm having the same issues as well. Thanks!

agmcarthur commented 3 weeks ago

I'm afraid we have had no updates @jihen-lau.

jihen-lau commented 3 weeks ago

@agmcarthur Thank you for your response! I'm wondering if these warnings have any significance. Could our output file still be reliable despite these warnings?

[W::sam_parse1] mapped query cannot have zero coordinate; treated as unmapped and "WARNING :model with id : 128, has few mapped reads to make consensus sequence skipping : ARO:ID"

agmcarthur commented 3 weeks ago

@jihen-lau that warning is benign, it just means that particular reference sequence had so few reads mapped to it that a consensus allele could be generated, i.e. that allele is very likely not present in your data.

jihen-lau commented 3 weeks ago

@agmcarthur Thanks for clarification! In such case, can I take the multiple lines of "[W::sam_parse1] mapped query cannot have zero coordinate; treated as unmapped" as a benign warning as well?

agmcarthur commented 3 weeks ago

Yes indeed!

jihen-lau commented 3 weeks ago

@agmcarthur once again thanks for the clarification and the tools!