blahah / transrate

Understand your transcriptome assembly
http://hibberdlab.com/transrate
Other
100 stars 34 forks source link

Weird read mapping metrics on version 1.0.3 #208

Open RuiMMFaria opened 7 years ago

RuiMMFaria commented 7 years ago

Hi,

first thanks for developing Transrate, it has been very useful for my work.

I recently re-ran Transrate with the version 1.0.3 for some transcriptomes I assembled before with Trinity. However, some of the read mapping metrics are now different from before and look a bit weird (see below). In the current version, p contigs uncovered is always 1.0 in different transcriptomes from different species. This is a bit weird because I do not think that all contigs have a mean per-base read coverage of < 1 . Is this some kind of problem or am I misinterpreting the meaning of this metric. Thanks in advance.

Rui

INFO] 2017-02-14 21:54:04 : Read mapping metrics: [ INFO] 2017-02-14 21:54:04 : ----------------------------------- [ INFO] 2017-02-14 21:54:04 : fragments 14855081 [ INFO] 2017-02-14 21:54:04 : fragments mapped 7419834 [ INFO] 2017-02-14 21:54:04 : p fragments mapped 0.5 [ INFO] 2017-02-14 21:54:04 : good mappings 5732827 [ INFO] 2017-02-14 21:54:04 : p good mapping 0.39 [ INFO] 2017-02-14 21:54:04 : bad mappings 1687007 [ INFO] 2017-02-14 21:54:04 : potential bridges 0 [ INFO] 2017-02-14 21:54:04 : bases uncovered 20114500 [ INFO] 2017-02-14 21:54:04 : p bases uncovered 0.41 [ INFO] 2017-02-14 21:54:04 : contigs uncovbase 37736 [ INFO] 2017-02-14 21:54:04 : p contigs uncovbase 0.55 [ INFO] 2017-02-14 21:54:04 : contigs uncovered 68330 [ INFO] 2017-02-14 21:54:04 : p contigs uncovered 1.0 [ INFO] 2017-02-14 21:54:04 : contigs lowcovered 68330 [ INFO] 2017-02-14 21:54:04 : p contigs lowcovered 1.0 [ INFO] 2017-02-14 21:54:04 : contigs segmented 6071 [ INFO] 2017-02-14 21:54:04 : p contigs segmented 0.09

blahah commented 7 years ago

hi @RuiMMFaria, indeed this looks pretty weird.

What command did you use to run transrate? And what kind of reads do you have? It's very unusual to have 0 potential bridges, which suggests to me that maybe your read pairing got messed up.

RuiMMFaria commented 7 years ago

Hi, those are paired-end data (illumina 100bp). Yes, I did not notice the bridges. I will try to take a look at that. The command was: image But in one of the three assemblies I also tried Transrate version 1.0.1 and bridges are not zero and percentages of contigs not covered are low (and not 1). (I will confirm this by repeating in the next days as I did this almost one month ago).

Also, in the contigs.csv output of Transrate 1.03 I see that some of the columns (bridges and coverage) are 0 but then the proportion of bases coverage seem normal.

image

blahah commented 7 years ago

@RuiMMFaria OK thanks for reporting - this seems like a bug. I'll try to take a look this coming week. We are long overdue a new version release anyway,

RuiMMFaria commented 7 years ago

Hi blahah (sorry but really do not know your real name), Thanks for the fast reply and for looking into it. Ok, I will use the new version when you release it to compare metrics.

RuiMMFaria commented 7 years ago

Ah, I just remember one thing that I'm not sure if it is or not relevant (sorry, I'm not a computer scientist). When I installed transrate version 1.0.3 for the first time I was having an error that some other people reported (related with Salmon): transrate-1.0.3-linux-x86_64/lib/app/ruby/bin.real/ruby: relocation error: /home/victaphanta/transrate-1.0.3-linux-x86_64/bin/librt.so.1: symbol __vdso_clock_gettime, version GLIBC_PRIVATE not defined in file libc.so.6 with link time reference. Someone reported a solution here: https://gitter.im/blahah/transrate?at=57a2088300663f5b1b47a9b8 and here https://github.com/blahah/transrate/issues/203. Deleting librt.so.1 Then, as other people reported, it worked.

Could this be somehow related? Just a guess because that was the only thing I remember it was odd. Best, Rui

mjfi2sb3 commented 7 years ago

Hi,

I am also getting odd stats when I used transrate 1.0.3. I attach the assembly.csv for a run with v1.0.3 and another with v1.0.1. The read mapping rate for 1.0.3 is significantly less 1.0.1. Also, the potential bridges is zero in 1.0.3. Other metrics had me concerned too. Perhaps the most alarming is the number of lost transcripts.

assemblies-1.0.1.xlsx assemblies-1.0.3.xlsx

I have reverted to 1.0.1

Many thanks, /SB

blahah commented 7 years ago

thanks @mjfi2sb3 - we definitely have a bug in 1.0.3

RuiMMFaria commented 6 years ago

Hi Blahah,

Can you let us know if the bug only affected the reported values? or if it really affected the estimates for the "good transcripts". Just wondering if I should re-do my analysis, or if only the reported values are not correct but the transcripts are.

RuiMMFaria commented 6 years ago

Hi,

maybe my question was not clear enough. Is the bug reported in 1.0.3 only about reporting wrong stats (reporting problem) or is affecting also the decision of selecting the "good transcripts". If it is the second, was this bug already fixed?

D-anders313 commented 4 years ago

Hi,

I am having a similar issue (see below) with my assembly of paired end data. I see that this is an older thread. Was this bug resolved in version 1.0.3? I previously had very good read mapping metrics using bowtie (94%) and SNAP reported only 2% unaligned reads. I am no bioinformatician, so I am not sure how comparable this is to the SNAP log but I have included it here.
Any help or advice would be greatly appreciated! Command used: transrate --assembly Assembled_Transcripts.fasta --left Samples1.fastq --right Samples2.fastq

snap - Copy.log

[ INFO] 2020-07-06 17:54:18 : fragments 51016563 [ INFO] 2020-07-06 17:54:18 : fragments mapped 6355682 [ INFO] 2020-07-06 17:54:18 : p fragments mapped 0.12 [ INFO] 2020-07-06 17:54:18 : good mappings 6065556 [ INFO] 2020-07-06 17:54:18 : p good mapping 0.12 [ INFO] 2020-07-06 17:54:18 : bad mappings 290126 [ INFO] 2020-07-06 17:54:18 : potential bridges 0 [ INFO] 2020-07-06 17:54:18 : bases uncovered 63378950 [ INFO] 2020-07-06 17:54:18 : p bases uncovered 0.69 [ INFO] 2020-07-06 17:54:18 : contigs uncovbase 76454 [ INFO] 2020-07-06 17:54:18 : p contigs uncovbase 0.81 [ INFO] 2020-07-06 17:54:18 : contigs uncovered 94745 [ INFO] 2020-07-06 17:54:18 : p contigs uncovered 1.0 [ INFO] 2020-07-06 17:54:18 : contigs lowcovered 94745 [ INFO] 2020-07-06 17:54:18 : p contigs lowcovered 1.0 [ INFO] 2020-07-06 17:54:18 : contigs segmented 6321 [ INFO] 2020-07-06 17:54:18 : p contigs segmented 0.07