ExposuresProvider / icees-api

MIT License
2 stars 8 forks source link

Sums incorrect for Race variable at Association to all Features endpoint #317

Closed karafecho closed 2 months ago

karafecho commented 2 months ago

This issue is to report that the sums are incorrect for the Race variable at the Association to all Features endpoint. I think this is due to the Unknown - ### issue and/or missing data.

Examples:

image

Column sums for races do not add up correctly, but row sums do.

image

Column and row sums for sexes do add up correctly (as do the sums for all other variables).

hyi commented 2 months ago

Looked into this issue and found that the reason that the numbers don't add up is that Other(2131-1) Race value is included in patient data but not specified in the Race feature variable in all features yaml file for PCD. This issue can be fixed by adding Other(2131-1) as one of valid values of the Race feature variable in all features yaml file or adding Other as one of valid values of the Race feature variable in all features yaml file, and icees-db can be updated to do inexact match to match Other(2131-1) Race in patient data to Other specified in all features yaml file.

karafecho commented 2 months ago

Yeah, I remember the weird Other(2131-1) issue. (That's what I was incorrectly referring to as "Unknown - ###" issue.) The last time we discussed this, do you recall if we made any decisions on how to resolve the issue? It seems like we could either update the YAML file (again!!!), but that would mean keeping a confusing Other(2131-1) variable, or implement a hacky fix in which we simply change the name of Other(2131-1) to Other in the db?

karafecho commented 2 months ago

I confirmed that Other(2131-1) is now being returned by ICEES. Closing issue as resolved ...