VEuPathDB / EdaNewIssues

0 stars 0 forks source link

Restrict high cardinality variables from being selected as the main variable on the map viz tool #596

Closed danicahelb closed 1 year ago

danicahelb commented 1 year ago

ISSUE: for high cardinality variables, "7+other" legend colors change based on current map viewport

for the map viz in eda, we currently allow categorical variables with more than 8 unique values to be plotted by assigning colors to the "top 7" values in the viewport and grouping all other values in the viewport into an "other" category.

This is problematic as the legend colors are not fixed to a specific value, but rather get updated as the viewport changes. To see this in action, map "Occupation" in SCORE S. mansoni Cluster Randomized Trial

image image

the standalone map is introducing a new requirement to the backend, to allow users to specify values of categorical vars.

How should we address this in the EDA map viz? Some possible options:

  1. keep as is, colors assigned to each value will change based on viewport
  2. add the same restriction the other viz tools already have against high cardinality categorical vars (ie, only select categorical vars with 8 or fewer unique values). Users can utilize Derived Variables if they are interested in specific values for a variable with >8 categories
  3. Allow high cardinality variables to be plotted by implementing "7+ other" but do not change the legend colors or which values are being assigned to "other" based on the current viewport
  4. align EDA with SAM by allowing users to specify values of categorical variables.

For option 4, note that in SAM, a user chooses their overlay variable once while in EDA the user chooses their overlay variable for every viz.

danicahelb commented 1 year ago

For the next grant cycle, we will wait for derived vars functionality and reevaluate if we need to implement option 4. integrating SAM functionality into the EDA is NOT something we will implement in the next grant cycle.

where does this leave us w the constraints being revised? the legend changing w the viewport is pretty bad. this also means @d-callan needs to support two different ways to build a map in the backend (which is doable, but would be great to avoid if possible)

danicahelb commented 1 year ago

@d-callan the clinepi team is ok with options 2 or 3, with a preference for option 3. do you think we should discuss in an EDA UX meeting?

d-callan commented 1 year ago

I think I need to know more about option 3. I'd assume the colored values were the 7 most common in the dataset, less any values explicitly filtered against. True? Also, the inconsistency w other vizs makes me a bit uncomfortable tbh.. I can imagine a user wondering why they can't do the same elsewhere.

danicahelb commented 1 year ago

yes, 7 most common in the dataset. I think we want to implement the 7+other everywhere to match the map, instead of the other way around. We can talk about it in UX

danicahelb commented 1 year ago

Discussed in 3/10 EDA UX meeting

We will remove the "7+other" functionality to get around high cardinality variables.

Instead, we will add the same restriction against high cardinality categorical vars as the other viz tools to the map

bobular commented 1 year ago

Looks like this isn't yet rolled out onto the qa back end?

Here are the variable constraints returned by the back end for the pass-through map-markers-overlay endpoint on qa:

{
    "xAxisVariable": {
        "isRequired": true,
        "minNumVars": 1,
        "maxNumVars": 1,
        "description": "Categorical variables with more than 8 values will assign the top 7 values by count to their own categories and assign the additonal values into an 'other' category. Continuous variables will be binned into by default 8 categories."
    }
}

And here is the correct response from the dev back end:

{
    "xAxisVariable": {
        "isRequired": true,
        "minNumVars": 1,
        "maxNumVars": 1,
        "maxNumValues": 8,
        "description": "Variable must have 8 or fewer unique values."
    }
}

So this is still a deployment issue - I will move it back to "to be deployed" and we can look again when Dave's back.

danicahelb commented 1 year ago

this seems fixed, thanks

Image