cmu-delphi / covidcast-indicators

Back end for producing indicators and loading them into the COVIDcast API.
https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html
MIT License
12 stars 17 forks source link

added progress chunk to limit logging #2024

Closed aysim319 closed 1 month ago

aysim319 commented 1 month ago

Description

too much logging for the progress

hopsital-admission

{"time": "datetime.datetime(2024, 8, 14, 14, 37, 55, 187263)", "event": "starting download", "logger": "delphi_claims_hosp.run", "level": "info", "pid": 68373, "timestamp": "2024-08-14T18:37:55.187298Z"}
{"filename": "EDI_AGG_INPATIENT_20240813_1452CDT.csv.gz", "event": "File to download", "logger": "delphi_claims_hosp.run", "level": "info", "pid": 68373, "timestamp": "2024-08-14T18:37:58.514091Z"}
{"filename": "EDI_AGG_INPATIENT_20240814_0252CDT.csv.gz", "event": "File to download", "logger": "delphi_claims_hosp.run", "level": "info", "pid": 68373, "timestamp": "2024-08-14T18:37:58.514295Z"}
{"filename": "EDI_AGG_INPATIENT_20240813_1452CDT.csv.gz", "percent": 0, "event": "Transfer in progress", "logger": "delphi_claims_hosp.run", "level": "info", "pid": 68373, "timestamp": "2024-08-14T18:37:58.587617Z"}
{"filename": "EDI_AGG_INPATIENT_20240813_1452CDT.csv.gz", "percent": 25, "event": "Transfer in progress", "logger": "delphi_claims_hosp.run", "level": "info", "pid": 68373, "timestamp": "2024-08-14T18:38:02.373432Z"}
{"filename": "EDI_AGG_INPATIENT_20240813_1452CDT.csv.gz", "percent": 50, "event": "Transfer in progress", "logger": "delphi_claims_hosp.run", "level": "info", "pid": 68373, "timestamp": "2024-08-14T18:38:05.321412Z"}
{"filename": "EDI_AGG_INPATIENT_20240813_1452CDT.csv.gz", "percent": 75, "event": "Transfer in progress", "logger": "delphi_claims_hosp.run", "level": "info", "pid": 68373, "timestamp": "2024-08-14T18:38:07.928315Z"}
{"filename": "EDI_AGG_INPATIENT_20240813_1452CDT.csv.gz", "percent": 100, "event": "Transfer in progress", "logger": "delphi_claims_hosp.run", "level": "info", "pid": 68373, "timestamp": "2024-08-14T18:38:10.895372Z"}
{"filename": "EDI_AGG_INPATIENT_20240814_0252CDT.csv.gz", "percent": 0, "event": "Transfer in progress", "logger": "delphi_claims_hosp.run", "level": "info", "pid": 68373, "timestamp": "2024-08-14T18:38:10.985207Z"}
{"filename": "EDI_AGG_INPATIENT_20240814_0252CDT.csv.gz", "percent": 25, "event": "Transfer in progress", "logger": "delphi_claims_hosp.run", "level": "info", "pid": 68373, "timestamp": "2024-08-14T18:38:14.024649Z"}
{"filename": "EDI_AGG_INPATIENT_20240814_0252CDT.csv.gz", "percent": 50, "event": "Transfer in progress", "logger": "delphi_claims_hosp.run", "level": "info", "pid": 68373, "timestamp": "2024-08-14T18:38:17.388245Z"}
{"filename": "EDI_AGG_INPATIENT_20240814_0252CDT.csv.gz", "percent": 75, "event": "Transfer in progress", "logger": "delphi_claims_hosp.run", "level": "info", "pid": 68373, "timestamp": "2024-08-14T18:38:20.633262Z"}
{"filename": "EDI_AGG_INPATIENT_20240814_0252CDT.csv.gz", "percent": 100, "event": "Transfer in progress", "logger": "delphi_claims_hosp.run", "level": "info", "pid": 68373, "timestamp": "2024-08-14T18:38:23.601564Z"}

doctor-visits

{"time": "datetime.datetime(2024, 8, 14, 14, 39, 49, 406542)", "event": "starting download", "logger": "delphi_doctor_visits.run", "level": "info", "pid": 68402, "timestamp": "2024-08-14T18:39:49.406580Z"}
{"filename": "EDI_AGG_OUTPATIENT_20240813_1452CDT.csv.gz", "event": "File to download", "logger": "delphi_doctor_visits.run", "level": "info", "pid": 68402, "timestamp": "2024-08-14T18:39:53.062629Z"}
{"filename": "EDI_AGG_OUTPATIENT_20240814_0252CDT.csv.gz", "event": "File to download", "logger": "delphi_doctor_visits.run", "level": "info", "pid": 68402, "timestamp": "2024-08-14T18:39:53.062878Z"}
{"filename": "EDI_AGG_OUTPATIENT_20240813_1452CDT.csv.gz", "percent": 0, "event": "Transfer in progress", "logger": "delphi_doctor_visits.run", "level": "info", "pid": 68402, "timestamp": "2024-08-14T18:39:53.182709Z"}
{"filename": "EDI_AGG_OUTPATIENT_20240813_1452CDT.csv.gz", "percent": 25, "event": "Transfer in progress", "logger": "delphi_doctor_visits.run", "level": "info", "pid": 68402, "timestamp": "2024-08-14T18:40:09.275755Z"}
{"filename": "EDI_AGG_OUTPATIENT_20240813_1452CDT.csv.gz", "percent": 50, "event": "Transfer in progress", "logger": "delphi_doctor_visits.run", "level": "info", "pid": 68402, "timestamp": "2024-08-14T18:40:19.720957Z"}
{"filename": "EDI_AGG_OUTPATIENT_20240813_1452CDT.csv.gz", "percent": 75, "event": "Transfer in progress", "logger": "delphi_doctor_visits.run", "level": "info", "pid": 68402, "timestamp": "2024-08-14T18:40:30.740461Z"}
{"filename": "EDI_AGG_OUTPATIENT_20240813_1452CDT.csv.gz", "percent": 100, "event": "Transfer in progress", "logger": "delphi_doctor_visits.run", "level": "info", "pid": 68402, "timestamp": "2024-08-14T18:40:41.629636Z"}
{"filename": "EDI_AGG_OUTPATIENT_20240814_0252CDT.csv.gz", "percent": 0, "event": "Transfer in progress", "logger": "delphi_doctor_visits.run", "level": "info", "pid": 68402, "timestamp": "2024-08-14T18:40:41.721008Z"}
{"filename": "EDI_AGG_OUTPATIENT_20240814_0252CDT.csv.gz", "percent": 25, "event": "Transfer in progress", "logger": "delphi_doctor_visits.run", "level": "info", "pid": 68402, "timestamp": "2024-08-14T18:40:53.522189Z"}
{"filename": "EDI_AGG_OUTPATIENT_20240814_0252CDT.csv.gz", "percent": 50, "event": "Transfer in progress", "logger": "delphi_doctor_visits.run", "level": "info", "pid": 68402, "timestamp": "2024-08-14T18:41:04.152777Z"}
{"filename": "EDI_AGG_OUTPATIENT_20240814_0252CDT.csv.gz", "percent": 75, "event": "Transfer in progress", "logger": "delphi_doctor_visits.run", "level": "info", "pid": 68402, "timestamp": "2024-08-14T18:41:16.296555Z"}
{"filename": "EDI_AGG_OUTPATIENT_20240814_0252CDT.csv.gz", "percent": 100, "event": "Transfer in progress", "logger": "delphi_doctor_visits.run", "level": "info", "pid": 68402, "timestamp": "2024-08-14T18:41:27.713898Z"}

Associated Issue(s)

aysim319 commented 1 month ago

Huh, do we really need progress logging? Download start and end seem like the most important bits of information to me, but I'm totally unfamiliar with this code. If we're committed to logging these, this seems fine to me, though it's going to be a bit subtle to understand that progress_chunks is being modified on every call after a few months away from this code. I thought a little about how to handle this with generators, but that doesn't seem much easier to read. I'd add a comment above the remove, something to the effect of "# Remove progress chunk, so it is not logged again".

¯_(ツ)_/¯ honestly, I'm not sure; it's a nice to have I guess... If people don't case about having the progress; I agree of just having a start and a finish; I feel like it seems a waste to constantly pinging this call back; it's a simple function, so not much harm I guess.

But, I did it this way with the assumption of we would prefer to have the progress. I tried to tinker around a way that would be less subtle, but since this is called as a callback, i couldn't think of a way to basically remember that the checkpoint was already logged;

Doing it exact percentage would basically only log 0 and 100 since the 25,50,75 percentage were a fraction. logging around with rounding didn't get much. And having a temp variable and comparing doesn't work (basically would need to be defined not locally, since the comparsion would never be equal)...