MultiQC / MegaQC

Web application to collect and visualise data across multiple MultiQC runs.
http://megaqc.info/
GNU General Public License v3.0
94 stars 28 forks source link

Color by categorical in Comparison Plot and fix filters in Report Plot #469

Open gouinK opened 1 year ago

gouinK commented 1 year ago

PR Checklist

Description of changes

Previously, Report Plots would show all data even when filters were applied (Issue), had to change how get_report_plot was extracting the filters in order to get the expected behavior where the plots only show samples that pass the applied filters.

In the Comparison Plots, only continuous variables could be used for the color. Added the ability to detect if the color is set to a variable that is categorical (i.e. contains strings) and automatically generate a color scheme based on the unique values.

In the Comparison Plots, if, for example, a user specified an x and y variable and some samples were missing the y variable but had the x variable, the plot would error out. Fixed this by changing None to np.nan.

Encountered error in Comparison Plots where the point size was not being set as an integer, so casted this variable as an integer.

Technical details

Used the pencils package to aid in the automatic generation of color schemes for categoricals. This package contains a large list of unique colors, so it should be sufficient for situations where even hundreds of unique values are present in the categorical variable.

Additional context

multimeric commented 1 year ago

Looks good. These two plots are the "old" format so they haven't received much attention, but if you can make minor improvements to them without much effort them I'm happy to merge.

wrt the pencils package, are you sure we can't do this with plotly, and avoid adding a new package? https://plotly.com/python-api-reference/generated/plotly.colors.html#module-plotly.colors

gouinK commented 1 year ago

In order to support as many different colors as pencils does, we would need to concatenate all of the plotly discrete colorscales and then randomly pick from there, but even then the number of possible colors would be less than in pencils. Alternatively, one could pick colors completely at random by iterating through RGB codes; but using pre-defined color palettes is better in opinion because there is less chance for ambiguity. I could see arguments going either way, but this is my simple approach to the problem.

Just curious, what do you mean by "old" format of plots? Where can I find the "new" format of plots?

multimeric commented 1 year ago

In either case I would prefer a fixed sequence of colours, I don't think randomness makes for a good UX.

The old pages use the /api endpoints and are written in raw JS, whereas the new pages use the /rest_api endpoints and are written in JS that is compiled using webpack.