VEuPathDB / EdaDataService

Apache License 2.0
2 stars 0 forks source link

SAM's Vizs returning `__UNSELECTED__` overlay stratum when not requested or appropriate #357

Open moontrip opened 8 months ago

moontrip commented 8 months ago

While working on https://github.com/VEuPathDB/web-monorepo/issues/84 using Vectorbase Mega study, I found a bug case for Histogram Viz when using the following specific variables and overlay (screenshot is attached in the end). Note that I also realized that it happened to other Vizs where overlay by age can be used:

study: Vectorbase Mega study; marker variable: Age; Histogram Viz (or other Vizs); Main variable in the plot: blood meal host prevalence; Overlay: Age.

When using Overlay: None, then the plot just works fine. However, if selecting Overlay Age, the plot becomes empty: this happens at both donut and bar plot markers: bubble marker does not support overlay by variable (always None). I checked backend response and it returns data, so I thought it was some data handling issue at Viz/plot issue.

After digging into it carefully, I could finally find the reason. At each item of response.data array, it contains overlayVariableDetails object which consists of entityId, variableId, and value. If overlay is used and response.data has multiple arrays, the value at each array serves as an identifier (overlay vocabulary).

The Age has the following five vocabularies:

[ "2880", 
  "F1 adults from field collected mosquitoes", 
  "F0 adults emerged from field collected larvae",  
  "F0 adults",
  "third instar larva;fourth instar larva            "
]

Note that the last item, "third instar larva;fourth instar larva " is also quite strange to have lots of empty spaces: although it is not the culprit of the bug, it may potentially become another troublemaker.

However, the overlayVariableDetails of the variable Age in the response is:

{
  "variableId": "OBI_0001167",
  "entityId": "EUPATH_0000609",
  "value": "__UNSELECTED__"            <-- strange value
}

So, for displaying plot, the overlayVariableDetails .value should be one of the five vocabularies, but it has strange string, "__UNSELECTED__". This causes that the plot cannot interpret the data correctly: more exactly, for Histogram Viz, there is reorderData() to categorize data into each vocabulary so that the plot shows overlays. Since the value does not match any vocabulary, the data after going through reorderData() becomes null.

It is worth noting that overlay: None works just fine. For example, Overlay variable of Sample Type also worked out. Also, I confirmed that this happens at Live site as well.

Due to this reason, I think that there seems to be a metadata issue for the variable Age or mistreatment it at backend side.

SAM overlay age bug

bobular commented 8 months ago

This is a bit strange! Thanks DK.

Since we merged VEuPathDB/web-monorepo#826 we get acceptable UX: (no need to rush out a fix for b67)

image

The response from the back end is as you described. One stratum for __UNSELECTED__ (back end representation of "All other values" when we have high-cardinality variables). Somehow this low-cardinality variable shouldn't have been treated this way.

Investigating further, I added a big filter on Age IN (<all values>) and the back end correctly responds with a 204. This is because there is no data with both blood meal host prevalence AND age, so the subset is empty.

Without that big filter, I think the back end should return an empty response. histogram.data = [] - it shouldn't return an __UNSELECTED__ stratum if it hasn't been requested in the request's config.overlayConfig.overlayValues.

I'll move this to EdaDataService