cmu-delphi / covidcast-indicators

Back end for producing indicators and loading them into the COVIDcast API.
https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html
MIT License
12 stars 17 forks source link

refactor hospital admission to use delphi_utils create_export_csv #2032

Open aysim319 opened 1 month ago

aysim319 commented 1 month ago

Changelog

kept the existing method for testing/review purposes

Associated Issue(s)

aysim319 commented 2 weeks ago

I wish there werent so many formatting changes here, it makes it hard to see the real meat of the functionality changes. :(

Almost all the formatting changes is auto generated with darker, so that's out of my hand

Is this all worth it? It looks like there is a lot of stuff happening here just to shoehorn in the usage of a fairly simple ~45 line file utility method... Does this give us an efficiency boost? Do you have timing comparisons of runs of the old vs the new code? Did you consider creating a smaller diff by just replacing calls to (or within) write_to_csv() with calls to create_export_csv()?

The current create_export_csv is slower than writing in a for loop, but it's also not the main bottleneck that's making this indicator slower. The difference is about 60 seconds between the for loop and the create_export_csv as is, but there are quick changes (groupby date and not filtering per date) that decreases the difference to 30 seconds

cprofile_main.txt cprofile_create_export_csv.txt cprofile_create_export_csv_updated.txt

There are low hanging fruits that would negate the 30 second decrease and more. Similar to doctor visits there are multiple read and write of the csv that can be relative easily removed. I think it's easier in terms of maintainability and readability to refactor over to create_export_csv and also have some extra stuff that's already in create_export_csv like (missingness and remove nulls)

There are some logging lines that are being removed -- can we replace them with similar stuff in create_export_csv()? There's one that's the warning for se that I just added and added the number of rows in the export

dshemetov commented 2 weeks ago

Honestly my bad for not documenting the darker workflow better -- make your code changes, commit them, then do make format in the indicator you're changing, let it make its formatting changes, and then commit those separately. I didn't think this would be necessary, since its whole premise is that it only formats the lines you touch, but it turned out to be more aggressive about resorting our imports than I thought it would be.

aysim319 commented 2 weeks ago

Honestly my bad for not documenting the darker workflow better -- make your code changes, commit them, then do make format in the indicator you're changing, let it make its formatting changes, and then commit those separately. I didn't think this would be necessary, since its whole premise is that it only formats the lines you touch, but it turned out to be more aggressive about resorting our imports than I thought it would be.

I've been doing running the linter more frequently so some of the commit also have formatting changes, since the ci run checks for lints before the tests and I like to check that all the checks pass within my comments, but I can work on isolating the comments for functional changes and linting for future reference.

But in terms of the file changed page the default shows all the changes that was made and unless you manually filter out the lint commits. At least that's how I understand how the file changed page works... Is there a workaround that doesn't involve manually filtering?

dshemetov commented 2 weeks ago

Is there a workaround that doesn't involve manually filtering?

Not that I know of, but often times looking through the individual commits is good enough for me.