biglocalnews / warn-transformer

Consolidate, enrich and republish the data gathered by warn-scraper
https://warn-transformer.readthedocs.io
Apache License 2.0
4 stars 3 forks source link

Additions format doesn't allow further automation #171

Open stucka opened 1 year ago

stucka commented 1 year ago

Bots work off the additions file that looks like this:

hash_id,postal_code,company,location,notice_date,effective_date,jobs,is_temporary,is_closure,is_amendment 5d791bf6839704caea183ffc6948d0f9ba77c03214cd7b78bc98a0c3,CA,"Gillette Citrus, Inc.",10175 S. Anchor Avenue Dinuba CA 93618,2023-06-23,2023-09-01,93,,,False

That's got fewer fields than the standardized version: notice_date,effective_date,received_date,company,city,num_employees,layoff_or_closure,county,address,source_file

Notably, the county field doesn't make it into additions, meaning warn-bots cannot filter by county.

stucka commented 1 year ago

This is probably off warn-transformer/warn_transformer/integrate.py around line 116.

stucka commented 1 year ago

Need to double-check any changes are going to meet some criteria:

-- Don't mess up anything else with the Actions workflow

-- Don't mess up with the data.

That includes making sure hashes don't change.

See also https://github.com/biglocalnews/warn-bot/issues/13

stucka commented 2 months ago

There needs to be some significant patching to bring counties in.

Will every state transformer need to be patched to export county, or can schema be set up to work without it?

schema around line 14: Need to add county

schema around line 181: Need to adapt hash to build a hash off a copy of the row that excludes the county field.

integrate around line 170: Need to add county in correct order

consolidate might actually function as needed.

warn-bot might actually function as needed, albeit with patching added in.

stucka commented 2 months ago

Test CSVs will need to be amended if scrapers are patched to bring in more data .