cschlaffner / PROTzilla2

12 stars 0 forks source link

Prevent dashbio bug in Volcano Plots #520

Open hendraet opened 1 week ago

hendraet commented 1 week ago

Issue description

When a dataframe contains multiple entries for a protein group and is passed to the dashbio.VolcanoPlot function (e.g. here), it can lead to unwanted behavior. At least the coloring of highlighted proteins may be faulty so that significant proteins are highlighted as non-significant.

The reason for this lies in the _volcano.py file, line 475, where protein groups are removed based on index. There is an implicit assumption that each protein group only shows up once, which messes up the coloring if it's not met

Example that would not work (because of PG2): Index Protein ID Highlight
0 PG1 no
1 PG2 yes
2 PG2 yes
3 PG3 no
hendraet commented 1 week ago

Maybe sth. for @gritlm

hendraet commented 6 days ago

Reproduced the issue and it looks like this broken_volcano

Also here's the pickled data, I used in the create_volcano_plot function: data.zip

I used a slightly different different create_volcano_plot function, but it should be the same in the relevant parts.