NYCPlanning / db-facilities

🏭 🏢 🏬 🏣 🏤 🏥 🏦 🏨 🏪 🏫 🏩
https://nycplanning.github.io/db-facilities
0 stars 0 forks source link

`doe_lcgms` data pipeline broken #592

Closed mbh329 closed 1 year ago

mbh329 commented 1 year ago

See build here. There is an issue with the encoding of the source data which I think requires us to revert back to the old ETL for the doe_lcgms data.

github-actions[bot] commented 1 year ago

Stale issue message

fvankrieken commented 1 year ago

So this issue is "resolved" on the latest build. Do we want to leave open as an issue for handling invalid utf8 bytes in the pipeline, or if this happens infrequently enough just close and handle manually as it comes up? For example with the doe upk data this time, I just manually removed the invalid byte after finding it in the emailed excel file to make sure it really was just extraneous

damonmcc commented 1 year ago

it'd be really nice to handle invalid bytes automatically, but maybe that's asking of lot of db-facilities and/or data-library

I'd vote to close this issue and if/when it happens again, we open a new one in data-library