Open zstumgoren opened 2 years ago
Left a voice message at media inquiry number (202) 671-1904 and sent email to does@dc.gov
DC has been removed from the list of states to scrape in Prefect, pending feedback from DC Dept of Employment Services and final implementation of bugfix.
Followed up with DOES contacts via phone and email and tried various other departments. No luck so far.
Discovered that DC has American Jobs Center locations. They don't appear to post data on the DC job center site, but they may be a better starting point to locate whoever manages the data for DC. See below links for details and contact info:
Reached out to mayor's office. Person there took info and said they'd have someone from DC Comms dept reach out...
Got a callback from James Clopton in Rapid Response department. He's in charge of notifying the site maintainers in pubilc affairs dept about new filings. They in turn maintain the pages. He said they just recently (in the last week) switched to a new site, and he didn't realize the pages were broken. He notified public affairs folks about the breakage and said he'd pass my info along.
No fixes have been applied yet. Pinged James Clopton today. Awaiting response...
Our agency contact reached out to say they're finalizing updates to the pages prior to publishing. No official ETA, but sounds like we're getting closer...
In #385, while I was patching the CSV writing method, I added a simple hack to help this scraper work as we wait for a response.
https://github.com/biglocalnews/warn-scraper/blob/main/warn/scrapers/dc.py#L50-L64
DC data pages are now restored for 2012 through a newly posted 2022 page. However, 2014 still points to the 2018 page.
Also worth noting: The URL patterns remain all over the place (i.e. no regular pattern), so we'll need to scrape links from most recently available year.
I've notified the agency but I think we could update scrapers to start scraping from 2015 onward for now until the 2014 issue is resolved.
/cc @palewire
On a 2017 copy of the page in Archive.org, the URL 2014 now points to redirects to a URL specific to 2014, but it appears to be no longer available at that URL.
@chriszs Latest from DC contact:
It seems there may be more [pages?] dropping off. That [2014] page was lost in conversion and unrecoverable. I am waiting on feedback, but will follow up when I find out. Thanks and have a great weekend!
I'll pass along the Archive.org page you uncovered. Perhaps they can restore it from that page if they can confirm the accuracy of notices listed there...
The DC WARN pages appear to have changed. Scraper is raising the below error. Current status of the WARN pages listed on the 2021 page are:
2021-2018 work
2017 and earlier are broken, except for 2014, which appears to duplicate the 2018 data
[x] We should call the DC Dept. of Employment Services about the status of their WARN pages.
[ ] We'll need to update the scraper based on their response. If they do not respond, we should update the scraper to only pull data for 2018 to present
[ ] Restore DC to list of states to scrape in Prefect settings for WARN Update project
Stacktrace