DataBiosphere / data-explorer

BSD 3-Clause "New" or "Revised" License
10 stars 6 forks source link

SQL generated is incorrect sometimes for extra facets #377

Closed wnojopra closed 4 years ago

wnojopra commented 4 years ago

AMP PD admin reported: "I've noticed that the cohort query is saved incorrectly for filtering criteria entered in cards that are not shown by default, but added from a search. ... This appears to be random, as I've performed the exact same actions, clicks and keystrokes and received different results."

I believe the problem is only visible for time series facets in Extra Facets (that is, the facets that are not visible by default). The root cause is we are setting EXTRA_FACET_INFO in the current_app.config. EXTRA_FACET_INFO should be set per-session or per-request, but instead it is in a global dictionary that every user session touches.

Reproducible steps:

  1. Open two tabs, each open to https://time-series-data-explorer.appspot.com/
  2. In tab 1, search for "Angina" (an extra facet), add the Angina facet, and select any of the values in the angina facet.
  3. In tab 2, search for "BMI" (a different extra facet), add the BMI facet, and select any of the values in the BMI facet.
  4. Go back to tab 1, and save the cohort. After saving the cohort, you'll notice the SQL is incorrect. You'll get:
    SELECT DISTINCT t1.RANDID FROM ((SELECT RANDID FROM `verily-public-data.framingham_heart_study_teaching.framingham_heart_study_teaching.ANGINA` WHERE (1 = 1))) t1

    What I expected to get was

    SELECT DISTINCT t1.RANDID FROM ((SELECT RANDID FROM `verily-public-data.framingham_heart_study_teaching.framingham_heart_study_teaching` WHERE (ANGINA = 1 AND PERIOD = 1))) t1

I believe in step 2, we end up overwriting the EXTRA_FACET_INFO dictionary. In step 3, Angina no longer exists in the EXTRA_FACET_INFO dictionary, which is what we use to determine if the facet is a time series or not. Since it is not in the dictionary, we incorrectly determine that it is not a time series, and we produce the wrong SQL.

To fix, I believe we should only be reading and writing EXTRA_FACET_INFO to dictionary scoped to the request.