Consistent handling of missing data

NYCPlanning / data-engineering-qaqc

streamlit app for data engineering

https://edm-data-engineering.nycplanningdigital.com

1 stars 0 forks source link

Consistent handling of missing data #221

Open SashaWeinstein opened 2 years ago

SashaWeinstein commented 2 years ago

Each report should be able to handle two situations: when the data parameter is None and the data parameter is an empty dataframe. Each report should be able to catch these situations and tell the user what happened without throwing a bug.

Not all situations where the dataframe is empty are an issue on the backend. Sometime there may be no manual corrections applied and the app is correctly displaying what happened. The data should never be None though, this is always a bug.

I think the best way to ensure this consistency is automated testing. We should iterate through all our reports and pass them None and pd.DataFrame() and make sure they exhibit the correct behavior. I don't know how to check what st elements have been called, hopefully there is a good way.

abrieff commented 2 years ago

I think this is basically what you're looking for

https://blog.streamlit.io/testing-streamlit-apps-using-seleniumbase/

I'll note that working with selenium can really just be the worst - i'm not sure i'd recommend it as being worth the trouble given the internal-only use case here. You might run into some trouble getting it to work right in github actions (I assume that's where you'd want it, to run on commits). The alternative to something like selenium is to just mock out all calls to streamlit - a real hassle in its own right, and it wont test for you what the page looks like, but you'll have less devops issues with it.

All of that being said, I think it's maybe a worthwhile ticket to try and set up this selenium thing - but would recommend timeboxing the effort. Things may be better than I remember, it's been years at this point since i worked with it.

SashaWeinstein commented 2 years ago

Awesome thanks! This is exactly the feedback I'm looking for. What does "timeboxing" mean?

abrieff commented 2 years ago

Just like "I'm gonna spend a day/two days, etc trying to get this to work and if I can't figure it out by then I'll shelve it."

SashaWeinstein commented 2 years ago

Right that makes sense. I think most of our tasks are timeboxed lol