VEuPathDB / web-monorepo

A monorepo that contains all frontend code for VEuPathDB websites
Apache License 2.0
2 stars 0 forks source link

independent axis truncation for Histogram Viz #376

Closed moontrip closed 9 months ago

moontrip commented 10 months ago

This will probably address https://github.com/VEuPathDB/web-monorepo/issues/369.

I tested all Viz with axis control and found that only independent axis of Histogram Viz has the issue, even at ClinEpiDB. If I remember correctly, this issue has not been addressed for a long time, e.g., https://github.com/VEuPathDB/web-eda/issues/1295 even if there have been some efforts to resolve it at other PRs, e.g., https://github.com/VEuPathDB/web-eda/pull/1464 & https://github.com/VEuPathDB/web-eda/pull/1623.

For Histogram Viz, independent axis control like custom range triggers data request via viewport change. Thus, default independent axis range for Histogram Viz should remain the same regardless of setting custom range to judge if it is truncated or not. To resolve this, an additional prop is passed to useDefaultAxisRange hook so that the default range can be kept.

moontrip commented 10 months ago

@asizemore Somehow I cannot run microbiome's Visualization (got error) so I could not test your case, however, this would work. I am not quite sure if a configuration (.env.local) needs to be changed though.

Anyway, can you please check if it works with your test case in the ticket?

moontrip commented 10 months ago

Here is a demo: GEMS1 with the main variable of Age. As you can see, truncation bar is shown while the Bin Width varies depending on custom range.

https://github.com/VEuPathDB/web-monorepo/assets/12802305/b7edcb71-851c-46e3-b0b1-d6405c79aef0

asizemore commented 10 months ago

Thanks @moontrip !

So far i've just really tested the branch a bit. I noticed that the truncation bars appear now when the x axis range is shortened, even if no data has been cut off. For example, below i've changed the x axis to start at 1 instead of 0. I can verify there's no data between 0 and 1 but the truncation bar shows up. I think we need to check if the left side of the very first bin is less than the user-inputted x axis range min, and if the right side of the final bin is greater than the user-inputted x axis range max.

Screen Shot 2023-07-25 at 7 06 56 AM

Another question i have:

default independent axis range for Histogram Viz should remain the same regardless of setting custom range

Shouldn't this be true for all plots, not just histogram? Stop me if im expanding scope but id imagine that all plots should have the same default axis ranges regardless of the custom range. Or am i misunderstanding?

moontrip commented 10 months ago

Thanks @asizemore for your tests! 👍 I left my response next to your question in the following.

So far i've just really tested the branch a bit. I noticed that the truncation bars appear now when the x axis range is shortened, even if no data has been cut off. For example, below i've changed the x axis to start at 1 instead of 0. I can verify there's no data between 0 and 1 but the truncation bar shows up. I think we need to check if the left side of the very first bin is less than the user-inputted x axis range min, and if the right side of the final bin is greater than the user-inputted x axis range max.

DK: So, in your test case of the lower side, I know what the problem is: it is an edge case. The problem is that by request, we have set the following condition: if metadata.displayMin does not exist, then 0 is used to determine the default min range. Or, though this is rare case, the metadata.rangeMin information is incorrect and contains 0. The test variable most likely falls into the former, thus default min range becomes 0 in your test case, not 1, which causes the issue you observed: we have overlooked this edge case for truncation. If we should resolve this, then a simple solution is probably to compute ranges with and without such a constrained condition and use proper range for judging truncation, but I will need to check carefully whether this approach causes side effects.

Another question i have:

default independent axis range for Histogram Viz should remain the same regardless of setting custom range

Shouldn't this be true for all plots, not just histogram? Stop me if im expanding scope but id imagine that all plots should have the same default axis ranges regardless of the custom range. Or am i misunderstanding?

DK: I am sorry if it makes you confusion. Yes, default independent axis range for all Viz should remain the same as you said. What I wanted to say that under current format (before this PR), default independent axis range for Histogram Viz was changed whenever changing independent axis range because the Viz triggers data request to retrieve best bin width from the backend.

bobular commented 10 months ago

Hi @moontrip - I found the same problem as Ann (also GEMS1) image

It's not an edge case, it's a problem that the for every visualisation axis except the histogram x-axis, we are able to show the yellow warning ONLY when actual data has been cut off (because none of the other visualizations use the viewport to achieve "semantic zooming" - the min/max range of the data can be determined from the back end response). The histogram x-axis is a known bug as mentioned in my comment below. https://github.com/VEuPathDB/web-eda/issues/1295#issuecomment-1320050297

The only solutions involve making extra client requests (maybe just a request to distribution would be sufficient), or performing the check in the back end and returning truncation flags from the back end.

bobular commented 10 months ago

Although I see that we still use the language "may have been truncated" - so maybe your solution (with some simplifications I can suggest) is the best for users right now.

moontrip commented 10 months ago

Hi @moontrip - I found the same problem as Ann (also GEMS1) image

It's not an edge case, it's a problem that the for every visualisation axis except the histogram x-axis, we are able to show the yellow warning ONLY when actual data has been cut off (because none of the other visualizations use the viewport to achieve "semantic zooming" - the min/max range of the data can be determined from the back end response). The histogram x-axis is a known bug as mentioned in my comment below. VEuPathDB/web-eda#1295 (comment)

The only solutions involve making extra client requests (maybe just a request to distribution would be sufficient), or performing the check in the back end and returning truncation flags from the back end.

@bobular I understand what you have in mind but just wonder (or want to make sure) if we need to wait for backend work or solve it at front end level. For Ann's case, it happened due to the difference between actual data (or even metadata, starting from 1 at min) and display range (starting from 0 due to our constrained condition). Defining actual default range (e.g., starting from 1, not 0) to judge the display of the truncation may solve Ann's issue.

bobular commented 10 months ago

Taking a deep dive into this. All tests done with the current main branch.

Let's use GEMS1 -> participant repeated measure -> height EUPATH_0010075 as our example variable   Here are the range annotations:

distributionDefaults value
displayRangeMin 0
displayRangeMax 121
rangeMin 17.4
rangeMax 117.4
binWidth 0.88
units "cm"

The default visualization (and subsetting looks the same/similar) looks like this: image

Not shown is the default "custom range" which goes from 17.4 to 119.4.

Given that the minimum value is 17.4, we should be able to do a custom range starting at 10 and not get truncation warnings:

And indeed we can:

image

If we started the range at 20, we would hope to see a truncation warning, but no we don't. Well, we half do. We get one temporarily (calculated using the data from the previous back end call) and the blue warning text shows up and persists:

image

The truncation bar goes away because the new data response (requested for a "viewport" starting at 20) now produces a new min/max - and the min is greater than 20.

Note also that if we filter on a different variable (using a different variable so we don't get confused with "filter-aware" behaviours), e.g. "Height, 2nd" then the default "custom range" for the "Height" histogram is based on the filtered data from the back end, not the annotated ranges:

image

If we change the custom range start to 20 in the above, we get no warnings.

So from this we learn that the truncation flags are determined from the data returned from the back end, not the range annotations.

We can also see that in the code (HistogramVisualization.tsx). If we follow backwards from the truncationConfig call, we see that the "reference" ranges (independentAxisRange in particular) are passed in as defaultUIState - which contains independentAxisRange: defaultIndependentRange and defaultIndependentRange comes from useDefaultAxisRange which, in custom and auto modes only, gets its min and max from independentAxisMinMax which comes from histogramDefaultIndependentAxisMinMax which uses findIndependentAxisMinMax to get the min binStart and max binEnd from the response data.

an aside: it looks like the comment // using annotated range, NOT the actual data in HistorgramVisualization.tsx is wrong!

OK, so where do we go from here?

Here are some options:

  1. use a new, separate back end request to the distributions endpoint to get the values for independentAxisMinMax - no other code needs to change (relative to main at the time of writing)
  2. do not even attempt to determine the min and max from the data; remove independentAxisMinMax entirely, and call useDefaultAxisRange(xAxisVariable, undefined, undefined, undefined, undefined, 'Full') forcing 'Full' mode; this would then use the annotated ranges in all cases - hmm but then auto-zoom is broken.
  3. use the useDefaultAxisRange final argument hack I suggested here: https://github.com/VEuPathDB/web-monorepo/pull/376#discussion_r1273592361 - testing this on top of main I see that
    • auto-zoom works
    • custom zoom doesn't start out with the auto-zoom limits like every other viz does
    • truncation warnings are shown if the range is anywhere inside the annotated range - but "Data may have been truncated" is not a lie!

Option 2 is a no go. Option 3 fixes the missing truncation warnings but we end up with more warnings than strictly necessary and the initial custom zoom behaviour is non standard (it uses the annotated range, not the data range) - though that might be fixable.

My preference is for Option 1. It shouldn't be too tricky. See subsettingClient.getDistribution usage in HistogramFilter (do not use the getDistribution wrapper from util.ts - it just calls the thing with and without filters, but we just need to call it once with filters). The response even contains statistics.subsetMin and statistics.subsetMax - and I bet you can send an empty or missing binSpec and the back end will do something sensible (yes you can, binSpec is optional in the request). Just remember to pass the current filters.

bobular commented 10 months ago

I'd also recommend calling the distribution service to obtain distributionPromise in a separate usePromise before the main data = usePromise.

Add distributionPromise as a dependency to the main data = usePromise and return undefined if distributionPromise.value is nullish (and maybe also if distributionPromise.pending - see what we already do with filteredCounts).

Then we have to return the distributionPromise.value.statistics.subsetMin and Max as part of the main data = usePromise payload, so need to add two more props (subsetMin and subsetMax) to HistogramDataWithCoverageStatistics.

Then pull independentAxisMinMax out of data.value.subsetMin and data.value.subsetMax in the useMemo instead of calling histogramDefaultIndependentAxisMinMax.

This "bundling" of the data from two sources into one usePromise payload will create a smooth experience for the user - no multiple re-renders when the various back end calls come back. I did something similar here.

moontrip commented 10 months ago

Thank you for your detailed comments and suggestions @bobular 👍 Although I didn't read all very carefully, from my understanding I tend to agree with your comments. Will do my homework next week!

moontrip commented 10 months ago

@bobular I have reflected your changes, though they may not be exactly the same. I found it later that some implementations were not necessary, e.g., binSpec.displayRangeMin/Max values etc., but I just left it as was.

Also, noticed that using stats were not sufficient when multiple filters are used: e.g., GEMS1 main variable: birth date at Histogram Viz, while filtering/subsetting birth date and age. From my observation, it should consider min/max of both distribution data and stats to accurately compute min/max: I made a note concerning this.

One thing I want to say that I used subsettingClient.getDistribution as I was not quite sure how to request distribution data without using subsettingClient.getDistribution. It seems to me that independent axis truncation is now working correctly, but it would be great if you can test/review them.

bobular commented 9 months ago

When the x-axis variable has been filtered on, auto-zoom doesn't work properly.

Well that's very odd. It's working fine now*, and there are no new commits. Sorry about the fake news!

* I'm on a different machine. Maybe something was stale on Friday?

bobular commented 9 months ago

Hi @moontrip - I've just looked through all the code and it's also looking good. I have two change requests still, however:

  1. remove the binSpec from the request (trivial)
  2. avoid the two distribution requests (no need for the "background" one with no filters)

let me know if you have any issues with that

moontrip commented 9 months ago

Hi @moontrip - I've just looked through all the code and it's also looking good. I have two change requests still, however:

  1. remove the binSpec from the request (trivial)
  2. avoid the two distribution requests (no need for the "background" one with no filters)

let me know if you have any issues with that

Hi @bobular 👋 Thank you for your detailed observations and suggestions. 👍 Yes X-axis range control has been working fine from my test and I am glad that it was not an exception you might have found.

Will try to address your suggestions and get back to you.

moontrip commented 9 months ago

@bobular I wonder if you could get a success by removing the binSpec from the request? By doing so, it threw errors and could not make it done from my tests.

bobular commented 9 months ago

Sorry - the auto-zoom is still broken for e.g. GEMS1 "Height, 2nd" as x-axis on histogram when "Height, 2nd" is also filtered on.

I was, ahem, on the wrong branch earlier. 🤦

moontrip commented 9 months ago

@bobular No problem and thank you for sharing the example. I quickly tested it myself but it looks just fine for me. Below are screeshots using Height, 2nd filter from the range of 60 (min) to 90 (max). Can you please share your numbers used for subsetting/filter, just in case?

a) full range

histogramVIz-truncation01

b) auto-zoom

histogramVIz-truncation02

bobular commented 9 months ago

The problem with the broken auto-zoom when a variable is directly filtered is simply due to the use of getDistribution from filter/utils.ts

This function strips out the current variable from the filters. We don't want it to do that. If we call subsettingClient.getDistribution directly with the filters, it should be fine.

PS. dropping the binSpec works fine for me. I'll slack you the diff

moontrip commented 9 months ago

@bobular I have address most of your feedbacks listed earlier. One thing I could not resolve is that removing binSpec did not work for me: your suggestion in Slack (with diff) was what I tried before.

For example, you can try with GEMS1 Age or Birth Date regardless of the presence of filters, which causes data request errors. my guess based on the error message is that the backend somehow requires displayRangeMin/Max props.