dputhier / pygtftk

A python package and a set of shell commands to handle GTF files
GNU General Public License v3.0
45 stars 6 forks source link

Implement strategy when key has more than 50 values #79

Closed guillaumecharbonnier closed 5 years ago

guillaumecharbonnier commented 5 years ago

Currently, we get this error:

 |-- 22:11-ERROR-ologram : The selected key in --more-keys should be associated with less than 50 different values.

Obviously the current plot layout can not be printed in such situation but at least we could produce the table. Then we may think of a strategy to still display something. Maybe display the best 20 values according to their adjusted p-value?

dputhier commented 5 years ago

This is an interesting point and workaround. This would mean also that we may add something in the plot title to tell the user that this is a selection of annotations. Maybe we could also switch to a radar plot that maybe would be more suited as the number of features increases (?).

Le mer. 3 avr. 2019 22:42, guillaumecharbonnier notifications@github.com a écrit :

Currently, we get this error:

|-- 22:11-ERROR-ologram : The selected key in --more-keys should be associated with less than 50 different values.

Obviously the current plot layout can not be printed in such situation but at least we could produce the table. Then we may think of a strategy to still display something. Maybe display the best 20 values according to their adjusted p-value?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dputhier/pygtftk/issues/79, or mute the thread https://github.com/notifications/unsubscribe-auth/ABvxHm5IOZmncvnnHnyUylywls23Yojaks5vdRJDgaJpZM4cbZuf .

dputhier commented 5 years ago

The is no radar plot in plotnine at the moment. I will open an issue to know whether this is something ongoing...

qferre commented 5 years ago

Question : is the 50 value for the threshold completely arbitrary ? Because we can always provide a very wide barplot and let the user trim the resulting image.

dputhier commented 5 years ago

This is arbitrary. I think there is a limitation in PDF width

Le jeu. 4 avr. 2019 13:16, Quentin Ferré notifications@github.com a écrit :

Question : is the 50 value for the threshold completely arbitrary ? Because we can always provide a very wide barplot and let the user trim the resulting image.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dputhier/pygtftk/issues/79#issuecomment-479855998, or mute the thread https://github.com/notifications/unsubscribe-auth/ABvxHvZWNMliYHNOSqSRpZhnDdTI9WUdks5vdd8mgaJpZM4cbZuf .

qferre commented 5 years ago

The limitation in pdf width is arbitrary as well if I recall correctly ?

guillaumecharbonnier commented 5 years ago

Unless I misunderstand how you plan to use the radar plot, I think a volcano plot with p-val and FC with test_repel for interesting outliers should be suitable when testing more than ~50 motifs.

dputhier commented 5 years ago

Yep.

Le jeu. 4 avr. 2019 à 14:08, guillaumecharbonnier notifications@github.com a écrit :

Unless I misunderstand how you plan to use the radar plot, I think a volcano plot with p-val and FC with test_repel for interesting outliers should be suitable when testing more than ~50 motifs.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dputhier/pygtftk/issues/79#issuecomment-479870924, or mute the thread https://github.com/notifications/unsubscribe-auth/ABvxHpjN1vW57YV_eAPP06-ONrgkslz_ks5vdetVgaJpZM4cbZuf .

--

Denis Puthier laboratoire INSERM TAGC/INSERM U 1090 Parc Scientifique de Luminy case 928 163, avenue de Luminy 13288 MARSEILLE cedex 09 FRANCE Mail: denis.puthier@univ-amu.fr Tel: (National) 04 91 82 87 31 / (International) 33 4 91 82 87 31 Fax: (National) 04 91 82 87 01 / (International) 33 4 91 82 87 01

Web:

http://tagc.univ-mrs.fr/tagc/index.php/research/network-bioinformatics/dputhier

====================================================================

guillaumecharbonnier commented 5 years ago

Just reporting that the current plot code can display for way more than 50 keys before hitting the pdf width limit.

00_ologram_diagrams.pdf

guillaumecharbonnier commented 5 years ago

Actually, the only reason the plot is messed up is because feature_type for "--more-keys" is currently the combination of the key and the value separated by a line return. @dputhier @qferre Is there a reason for that or can we switch to another separator eg ": "?

guillaumecharbonnier commented 5 years ago

Also, can I add a third plot on the current pdf output with the FC metric? Current metrics put a visual emphasis on big features and user may be more interested in comparing which features have the highest enrichment bias.

dputhier commented 5 years ago

Yes. For sure you can implement additional plots. The volcano for instance may be a good choice. If you look at the code you will see that the plotting part needs some refactoring. In fact it would require to melt properly the dataframe once so that all plot could be done on the same dataframe...

dputhier commented 5 years ago

Yes, the feature_type is currently the combination of the key and the value separated by a line return. We had chosen this solution to avoid very long names in the plot which were also messing up the diagram...

qferre commented 5 years ago

The error was removed in dad7be338cac2f156cec15494bae1b44550f73d2. Fixed.