bmvdgeijn / WASP

WASP: allele-specific pipeline for unbiased read mapping and molecular QTL discovery
Apache License 2.0
103 stars 51 forks source link

Handling of partially mapped reads not implemented #78

Open feliciecos opened 5 years ago

feliciecos commented 5 years ago

Hi, I'm trying to run the CHT pipeline. When I run the bam2h5.py script, I have the following message: "WARNING skipping read: handling of partially mapped reads not implemented". After looking at the script, it seems to be a problem of read length but I don't understand what kind of problem. Can you give me more details on this warning? Thank you

bmvdgeijn commented 5 years ago

Hi,

Is this warning occurring for a large fraction of your reads or just a handful? I believe that some mappers trim the reads if the ends are not mapping, which will cause some of the reads to be shorter and throw this warning. Is it possible you are using one of these?

Bryce

On Mon, Dec 3, 2018 at 10:29 AM feliciecos notifications@github.com wrote:

Hi, I'm trying to run the CHT pipeline. When I run the bam2h5.py script, I have the following message: "WARNING skipping read: handling of partially mapped reads not implemented". After looking at the script, it seems to be a problem of read length but I don't understand what kind of problem. Can you give me more details on this warning? Thank you

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bmvdgeijn/WASP/issues/78, or mute the thread https://github.com/notifications/unsubscribe-auth/ADSDgmZPeLx-0YgjC1yfawNEFx0O-d4-ks5u1UNmgaJpZM4Y-0Yt .

feliciecos commented 5 years ago

Hi and thank you for your quick reply,

This warning occurs for approximatively 10% of the reads which may correspond to the proportion of trimmed reads. Indeed, I have used Trimgalore to remove the Illumina adapters. Is there something to do to keep these reads in the analysis?

Thank you, Félicie

heimancheung commented 5 years ago

Hi, I have the same problem. I used BWA as mapping tools, and before mapping, i trimmed the 150bp reads to 50~140bp. Is that a trigger of this warning?

SaideepGona commented 5 years ago

Any followup on this issue?

feliciecos commented 5 years ago

The problem has been identified (trimmed reads are filtered out during the analysis) but to my knowledge, no solution has been proposed on this issue.

Le 2019-10-28 19:14, Saideep Gona a écrit :

Any followup on this issue?

-- You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub [1], or unsubscribe [2].

Links:

[1] https://github.com/bmvdgeijn/WASP/issues/78?email_source=notifications&email_token=AKLNNEFKFLTS5GNCHQBNPVTQQ4TWRA5CNFSM4GH3IYW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECN4BLQ#issuecomment-547078318 [2] https://github.com/notifications/unsubscribe-auth/AKLNNEEKL3U6YDGSGPYVJXLQQ4TWRANCNFSM4GH3IYWQ

gmcvicker commented 5 years ago

In general it is not a good idea to trim reads if you want to look at allele-specific expression. The issue is that trimming tends to remove bases that do not match the reference sequence. This means that reads carrying a non-reference allele are more likely to have the base with the non-reference allele trimmed. This results in a bias, and is the reason that WASP does not use trimmed reads.

mikelove commented 3 years ago

I am receiving this warning for ~1% of reads after filtering. I'm going a bit off-piste, so I'm not sure if it's ignorable or not. I'm mapping reads with HISAT2 using an index built with known genotypes and known phasing, then providing the BAMs to CHT. I've filtered all reads with low mapping quality (which were ~5% of reads and balanced with respect to allele), and the remaining reads look fine to me via samtools view. Any other things to check?

bmvdgeijn commented 3 years ago

Hi Mike,

I am not sure that the mapping procedure that you are describing is safe from allelic biases. In our original paper, we showed that using SNP-aware mappers can lead to biases in both the reference and non-reference direction at individual loci. These would wash out and look like balance when looking at the entire genome, but still cause issues with genome-wide ASE detection. I would suggest either using the WASP mapping filtering or another method to flag biased regions ( e.g. from Tuuli Lappalainen's lab).

For the trimmed reads warning, we elected not to include these reads over concerns that trimming would complicate ASE signals. Of course, by throwing them out we may also be inducing bias. I generally disable trimming when mapping.

Hope this helps, Bryce

On Tue, Nov 16, 2021 at 10:00 AM Mike Love @.***> wrote:

I am receiving this warning for ~1% of reads after filtering. I'm going a bit off-piste, so I'm not sure if it's ignorable or not. I'm mapping reads with HISAT2 using an index built with known genotypes and known phasing, then providing the BAMs to CHT. I've filtered all reads with low mapping quality (which were ~5% of reads and balanced with respect to allele), and the remaining reads look fine to me via samtools view. Any other things to check?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bmvdgeijn/WASP/issues/78#issuecomment-970525569, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2IHARDXBRVDEJIJ7TTYB3UMKL5PANCNFSM4GH3IYWQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

mikelove commented 3 years ago

Thanks Bryce. Because we have a simulation, I can assess the bias at each SNP from truth using the read counts, so I'll keep that in mind and assess if there is an issue, and of course for real data I would definitely use and recommend the WASP mapping-based filtering.

The surprising thing with the warning is that I didn't do any trimming, and by eye the CIGAR are equal to the read length. I'm not sure still the origin of the warnings.

mikelove commented 3 years ago

We have no bias on the final allelic ratios. I think it's because we are simulating 2 x 150bp reads and using known phased genotypes. I get a ratio of 0.5004 +/- 0.0002 in cht_results.txt for features with moderate to high counts.

bmvdgeijn commented 3 years ago

If I recall correctly there is a particular CIGAR code for trimmed reads which is probably what is being filtered out.

Is this combining across multiple loci? I could envision some loci with .4 and some with .6 averaging out genomewide, but still having bias at individual loci.

On Nov 18, 2021, at 6:54 PM, Mike Love @.***> wrote:

 We have no bias on the final allelic ratios. I think it's because we are simulating 2 x 150bp reads and using known phased genotypes. I get a ratio of 0.5004 +/- 0.0002 in cht_results.txt for features with moderate to high counts.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

mikelove commented 2 years ago

hi Bryce, you were right. Even with haplotype-aware mapping, there was still bias at individual loci due to multi-mapping reads from the reference haplotype being filtered, where alternative allele reads overlapping the SNP were in some cases not multi-mapping.

I now use WASP mapping procedure (with haplotype-aware HISAT2 as the aligner) upstream of CHT, and this resolved the bias.

jeizenga commented 2 years ago

Hi @mikelove, I also ran into this warning, and in my case, any reads that have a softclipped alignment seem to trigger it. Perhaps that's what was causing your issue as well.

azurillandfriend commented 2 years ago

I noticed that @mikelove above did manage to use HISAT2 as the aligner prior to WASP mapping procedure. @mikelove did you have any issues with the read name format thrown out by HISAT2 because I got stuck at "filter_remapped_reads.py" step when trying to use HISAT2 mapped reads and couldn't progress further (error described further in this thread: filter_remapped_reads.py with bams created by HISAT2/ other mappers than Bowtie2 #114 ). The error was "ValueError: expected read names to be formatted like ... but got HISEQ:29:C3VGMACXX:5:2315:9066:5340". I'm now repeating with bowtie2 but would like to revert to HISAT2 if possible.

mikelove commented 2 years ago

I didn't get this error. Just the warnings above.

My alignment looks like:

        hisat2 -p {params.threads} -f -x {params.xdir} -1 {input.r1} -2 {input.r2} > {params.temp_sam}
        samtools view -b -q 60 {params.temp_sam} | samtools sort -@ {params.threads} -m {params.mem} -o {output}
        samtools index {output}

And there's nothing special about my find_intersecting_snps.py, remapping, or filter_remapped_reads.py calls.

azurillandfriend commented 2 years ago

Thanks @mikelove I'll try repeating my hisat2 alignment using the same settings as you and see if that helps.