VEuPathDB / web-monorepo

A monorepo that contains all frontend code for VEuPathDB websites
Apache License 2.0
2 stars 0 forks source link

Need better handling of x-axis labels of Line plots & timeline plots #762

Closed danicahelb closed 2 months ago

danicahelb commented 6 months ago

this is an issue on SAM line plots / timeline plots as well as EDA line plots.

If I bin "year" variables by 1 (such that each bin represents 1 year) and make a line or timeline plot, there are 2 issues:

  1. the data point on the left-hand side of the plot combines multiple years of data while all other data points contain just 1 year of data
  2. the data point on the righ-hand side of the plot does not align with the correct axis tick mark (though if I hover over the data point, the x-axis value is correct)

For example, in PRISM Resistance (on qa.restricted), data was collected between 2016-2022.

The right-most data point is labeled [2016, 2017], so this is combining data from 2 years:

image

The left-most data point is labeled (2021, 2022], so this just contains data from 2022. However, this data point ends up having an axis label of 2021:

image
dmfalke commented 4 months ago

This requires some investigation to determine if the issue is front end or back end.

asizemore commented 4 months ago

which years does the point at x=2017 contain?

asizemore commented 4 months ago

have you seen this on other line plots? Is the issue usually with dates? Or does this happen with numbers as well?

danicahelb commented 4 months ago

this is the data for x=2017 --> looks like it includes 2018 but NOT 2017

image

also, I'm plotting "year" on the x-axis, which is NOT a date. it is a continuous variable with a range from 2016-2018

danicahelb commented 4 months ago

i'm seeing the same issue when I am plotting "age" on the x-axis (after subsetting to just show ages 0-10yrs)

image image image
danicahelb commented 4 months ago

i also tried setting the bin width to 1, which i thought may prevent ages 0&1 from being combined together, but that didn't work

image
bobular commented 4 months ago

This looks like a mixture of front and back end.

The back end should probably add an extra bin at the start rather than do an all-inclusive bin which combines the first two values. @d-callan

In this situation the front end should be using the bin-end as the x-axis label. I can take care of that.

I'm assuming this is only an issue for integer variables? The all-inclusive first bin really isn't an issue for real-valued variables. I think the current x-axis labelling is also OK for real-valued variables.

danicahelb commented 4 months ago

here is an example from wash-b bangladesh, where I am plotting stature (a true continuous variable) on the x-axis

when unbinned, the first data point contains a range that is 0.1 larger than all other data points. I imagine this is fine.

image image
bobular commented 4 months ago

If that's a truly real-valued continuous variable, then that first bin is not 0.1 bigger than other bins, it's more like 1/infinity bigger - which is an infinitely tiny number. If in fact values are measured to the nearest 0.1 cm, then you could perhaps say it is 0.1 bigger, but in general, I think it's widely accepted to have different inclusivity/exclusivity rules for the first or last bin.

d-callan commented 3 months ago

ok. so reading through and trying to catch up.. am i correct in understanding this ticket specifically is about integer (and date) variables where we set the bin width to 1 (year, day, etc)? and were saying in that specific case that the first inclusive bin should be split?

d-callan commented 3 months ago

also, is there frontend work for this ticket? should i move this to the appropriate backend repo, or make a new backend ticket and let this one become about some specific frontend dev?

d-callan commented 3 months ago

im going to close this as completed via veupathdb/veupathutils#37 which is available in rserve v7.2.9.. if ive misunderstood something obviously feel free to reopen.

danicahelb commented 2 months ago

I see the data for the first x-axis tick is no longer being combined. but the axis tick label is not appropriate for the data. this data was collected from 2016-2022, but data is showing on the plot labeled 2015-2021.

image

If I hover over the data I can see that the first point is inclusive of 2015 & 2016... but since there is no 2015 data, the data in this point is just from 2016. so the data on the plot is correct, but the tick labels are misleading.

image

Maybe this needs some frontend work now?

moontrip commented 2 months ago

My two cents:

I am not that familiar with bin (range like [2015, 2016]) based plot, but from my understanding, Plotly relies on the first value of the bin value to mark the point in such a line plot. Thus, if a data has (2015, 2016) in a line plot, then the marker is shown at 2015 even if it may not contain value at that point. For example, if it is histogram, a bar is certainly shown in the range of 2015 and2016, however, it does does not that a value exists at 2015. Anyway, I think that this plotting rule (i.e., taking the first value in the bin) is commonly used for other plot tools when pointing single marker/point for a bin. Not quite sure if there is an option in the Plotly to plot the marker/point in the middle of 2015 and 2016 in such a case.

dmfalke commented 2 months ago

I wonder if it would make more sense to use bin labels for tick labels. So, the first tick label would be [2015-2016], the second would be (2016-2017], and so on. I don't know what this would entail. It could end up being challenging.

dmfalke commented 2 months ago

It looks like this is doable:

image

There are two properties that need to be set:

danicahelb commented 2 months ago

slack conversation: https://epvb.slack.com/archives/C012ZK4Q5CZ/p1710438061386039

@dmfalke's suggested changes to x-axis labels is not preferred, as it will add clutter and confusion.

@d-callan thinks the issue is that plotly expects the wrong end of the bins to be inclusive. ie, the first bin should be [2016, 2017)... instead of [2015, 2016]; and the last bin should be [2022, 2023) or [2022, 2023] ... instead of (2021, 2022]

danicahelb commented 2 months ago

beautiful, thanks @d-callan!

image