Closed danicahelb closed 2 months ago
This requires some investigation to determine if the issue is front end or back end.
which years does the point at x=2017 contain?
have you seen this on other line plots? Is the issue usually with dates? Or does this happen with numbers as well?
this is the data for x=2017 --> looks like it includes 2018 but NOT 2017
also, I'm plotting "year" on the x-axis, which is NOT a date. it is a continuous variable with a range from 2016-2018
i'm seeing the same issue when I am plotting "age" on the x-axis (after subsetting to just show ages 0-10yrs)
i also tried setting the bin width to 1, which i thought may prevent ages 0&1 from being combined together, but that didn't work
This looks like a mixture of front and back end.
The back end should probably add an extra bin at the start rather than do an all-inclusive bin which combines the first two values. @d-callan
In this situation the front end should be using the bin-end as the x-axis label. I can take care of that.
I'm assuming this is only an issue for integer variables? The all-inclusive first bin really isn't an issue for real-valued variables. I think the current x-axis labelling is also OK for real-valued variables.
here is an example from wash-b bangladesh, where I am plotting stature (a true continuous variable) on the x-axis
when unbinned, the first data point contains a range that is 0.1 larger than all other data points. I imagine this is fine.
If that's a truly real-valued continuous variable, then that first bin is not 0.1 bigger than other bins, it's more like 1/infinity bigger - which is an infinitely tiny number. If in fact values are measured to the nearest 0.1 cm, then you could perhaps say it is 0.1 bigger, but in general, I think it's widely accepted to have different inclusivity/exclusivity rules for the first or last bin.
ok. so reading through and trying to catch up.. am i correct in understanding this ticket specifically is about integer (and date) variables where we set the bin width to 1 (year, day, etc)? and were saying in that specific case that the first inclusive bin should be split?
also, is there frontend work for this ticket? should i move this to the appropriate backend repo, or make a new backend ticket and let this one become about some specific frontend dev?
im going to close this as completed via veupathdb/veupathutils#37 which is available in rserve v7.2.9.. if ive misunderstood something obviously feel free to reopen.
I see the data for the first x-axis tick is no longer being combined. but the axis tick label is not appropriate for the data. this data was collected from 2016-2022, but data is showing on the plot labeled 2015-2021.
If I hover over the data I can see that the first point is inclusive of 2015 & 2016... but since there is no 2015 data, the data in this point is just from 2016. so the data on the plot is correct, but the tick labels are misleading.
Maybe this needs some frontend work now?
My two cents:
I am not that familiar with bin (range like [2015, 2016]) based plot, but from my understanding, Plotly relies on the first value of the bin value to mark the point in such a line plot. Thus, if a data has (2015, 2016) in a line plot, then the marker is shown at 2015 even if it may not contain value at that point. For example, if it is histogram, a bar is certainly shown in the range of 2015 and2016, however, it does does not that a value exists at 2015. Anyway, I think that this plotting rule (i.e., taking the first value in the bin) is commonly used for other plot tools when pointing single marker/point for a bin. Not quite sure if there is an option in the Plotly to plot the marker/point in the middle of 2015 and 2016 in such a case.
I wonder if it would make more sense to use bin labels for tick labels. So, the first tick label would be [2015-2016]
, the second would be (2016-2017]
, and so on. I don't know what this would entail. It could end up being challenging.
It looks like this is doable:
There are two properties that need to be set:
layout.xaxis.tickvals
-- this should be the same as data[0].x
layout.xaxis.ticktext
-- this should be the same as data[0].binLabel
slack conversation: https://epvb.slack.com/archives/C012ZK4Q5CZ/p1710438061386039
@dmfalke's suggested changes to x-axis labels is not preferred, as it will add clutter and confusion.
@d-callan thinks the issue is that plotly expects the wrong end of the bins to be inclusive. ie, the first bin should be [2016, 2017)... instead of [2015, 2016]; and the last bin should be [2022, 2023) or [2022, 2023] ... instead of (2021, 2022]
beautiful, thanks @d-callan!
this is an issue on SAM line plots / timeline plots as well as EDA line plots.
If I bin "year" variables by 1 (such that each bin represents 1 year) and make a line or timeline plot, there are 2 issues:
For example, in PRISM Resistance (on qa.restricted), data was collected between 2016-2022.
The right-most data point is labeled [2016, 2017], so this is combining data from 2 years:
The left-most data point is labeled (2021, 2022], so this just contains data from 2022. However, this data point ends up having an axis label of 2021: