bonsai-team / matam

Mapping-Assisted Targeted-Assembly for Metagenomics
GNU Affero General Public License v3.0
19 stars 9 forks source link

Abundance computing: Add warning for scaffolds with abundance=0 #29

Open ppericard opened 7 years ago

ppericard commented 7 years ago

If no reads are re-mapping on a scaffold after keeping only the best equivalent reads alignments, we have to output a warning. This behavior has to be followed up. This should not happen, or we have to understand why this happens.

loic-couderc commented 6 years ago

I was able to reproduce this behaviour with the simulated dataset (mavromatis) on my last two runs. To illustrate on an example:

2017-10-04 17:36:58,979 - CRITICAL - Can't find the abundance for:1712
Can't find abundance

The blast file before best equivalent filtering have the 1712 reference (https://github.com/bonsai-team/matam/blob/develop/scripts/compute_abundance.py#L121)

NC_011830.1-1173946/2   29  100 101 0   0   1   101 933 1033    1.71e-40    176
NC_011830.1-1173946/2   36  100 101 0   0   1   101 923 1023    1.71e-40    176
NC_011830.1-1173946/2   167 100 101 0   0   1   101 816 916 1.71e-40    176
NC_011830.1-1173946/2   166 100 101 0   0   1   101 816 916 1.71e-40    176
NC_011830.1-1173946/2   1620    89.2    100 7   4   1   100 269 368 2.27e-25    126
NC_011830.1-1173946/2   1712    88.3    101 8   4   1   101 515 615 1.37e-24    123
NC_011830.1-1173946/2   379 88.2    100 8   4   1   100 807 906 4.56e-24    121
NC_011830.1-1173946/2   294 88.2    100 8   4   1   100 807 906 4.56e-24    121
NC_011830.1-1173946/2   353 88.2    100 8   4   1   100 806 905 4.56e-24    121
NC_011830.1-1173946/2   356 88.2    100 8   4   1   100 806 905 4.56e-24    121

But not after: (https://github.com/bonsai-team/matam/blob/develop/scripts/compute_abundance.py#L122)

NC_011830.1-1173946/2   166 100 101 0   0   1   101 816 916 1.71e-40    176
NC_011830.1-1173946/2   167 100 101 0   0   1   101 816 916 1.71e-40    176
NC_011830.1-1173946/2   29  100 101 0   0   1   101 933 1033    1.71e-40    176
NC_011830.1-1173946/2   36  100 101 0   0   1   101 923 1023    1.71e-40    176
ppericard commented 5 years ago

@loic-couderc Should this still be open ?

ppericard commented 5 years ago

76 is related