BIMSBbioinfo / pigx_sars-cov-2

PiGx SARS-CoV-2 wastewater sequencing pipeline
GNU General Public License v3.0
18 stars 3 forks source link

Filter out 0 values for frequencies of variants before plotting #32

Closed rekado closed 2 years ago

rekado commented 3 years ago

Frequencies of "null" or 5*10^-18 are not very useful. Cut off after an arbitrary number of digits after the decimal point.

rekado commented 3 years ago

We are now rounding values, but 0 is still actually plotted, which is a bit silly.

vicfabienne commented 3 years ago

The deconvolution reports 0 frequencies for variants that are not there. So they should be indeed filtered out. There is another problem for the mutations frequencies - there are actual NA values in the data and they are plotted to. I guess the plot should be able not to plot NA values? Anyways they should be filtered out too.

al2na commented 3 years ago

didin't we decide that everything below 0.001 will be displayed as 0? this also relates to the regression function to predict trending mutations. Can you mark the files or lines that this relates too? Is it @rcuadrat who was responsible for this?

vicfabienne commented 3 years ago

Yes, your right. I think we have to differentiate between the stacked plots and the map plots. For the stack plots we need the 0 and should plot them.

But for the map plots, we have this "when the variant is there, there is a colored disk of some color". An example: With the deconvolution we could for example get a bar "varA, varB, varC" - 0% and another bar "varD" - 10%. In the maps only a disk for "varD" would appear. Because only this one is uniquely identified. We dont't show there those "others" or "mixtures".

But we could also have the case: 1 bar "varA"- 0.0013% and 1 bar "varD" - 10%. In this case, we will get two disks - intuitively interpretable as "a we found two variants". But only when you hover over those discs you will realize that we still only found varD with 10% becuase we get "varA with 0%" - which is obviously the same as "no varA". And I think that's what Ricardo mentioned to be a bit silly to do and it's also misleading. Also in the legend you will have the smallest circle for the variants that are not present. It will increase as soon as they are plotted (like in this case WT). But there should be no difference between smth that is not plotted and smth that is 0. image

vicfabienne commented 3 years ago

But yes, we discussed that @rcuadrat will just filter 0 and NA out before doing the map plots

vicfabienne commented 2 years ago

fixed on development branch