Closed jsvine closed 1 year ago
Some observations:
INCIDENTID
numeric IDs do seem to reflect the order in which they are uploaded ...INCIDENTID
. (The search portal online include this disclaimer: "If the report link is not available for the report number(s) that you are looking for, please submit your request to HMRequest@dot.gov to get a copy of the full incident report.")X-2022120195
) are available online, but seem not quite right for sorting; although they appear possibly to increment by upload date, the numeric components appear to be assigned separately for forms submitted via XML (X
), Web (E
), and Paper (I
), thus making a precise sort order seem impossible (although this could use more investigation).Given the above (and because we don't necessarily know that the INCIDENTID
schema will remain usable forever), I think the git
history approach might be best.
This seems to be working well here — https://github.com/data-liberation-project/fema-daily-ops-email-to-rss/blob/main/scripts/historify.py — but would need some tweaking to adjust for the fact that we'll be looking not for the time a file was committed, but rather all the entries that were committed to one of several files (i.e., the past ~6 month-files).
Looking at an example report - https://portal.phmsa.dot.gov/PDFGenerator/getPublicReport/OHMIR_5800-1?INCIDENTID=2129024 - is the "Date" field listed "PART VIII - CONTACT INFORMATION" the date the incident was reported by the contact listed there? If there's a way to get that out of the report doc then that might be an option. That said, it might not matter if you're only concerned about posts are new and should be published as a new entry in the rss feed.
Very interesting! Although the PHMSA portal says that this version of the data includes "All fields included in Form 5800.1", that particular field appears to be missing. (It does, however, have other fields from that section, e.g., Contact Name
, Contact Title
, Contact Street
, Contact City
, Contact State
, Contact Postal Code
, Preparer Of Incident Report
.)
It would theoretically be possible to grab those dates from the PDFs, but:
INCIDENTID
above.)Sounds reasonable. I'll proceed with the git history approach.
Astro can generate RSS feeds that can be updated on git push (deploying on netlify): https://docs.astro.build/en/guides/rss/
You could do this for all 50 states + national easily too. Happy to help with that
I've gotten a start on this - I've copied the basic form @jsvine wrote here: https://github.com/data-liberation-project/fema-daily-ops-email-to-rss/blob/main/scripts/convert.py
I'm working on this in a version of the repo I forked over here: https://github.com/m-nolan/phmsa-hazmat-incident-reports
Several more steps are needed:
MakeFile
and .github/workflows/rss.yaml
files make sense. I'm flying blind here.fetch_feed()
method in scripts/rss.py
to convert the entries in data/fetched/*.csv
into a feedget_entry_attachment()
convert_entry()
and convert_feed()
as appropriate. This should follow from the formatting of each entry pulled from the feed we create in the previous step. Note that much of the data in the feed_attrs
dict in convert_feed()
has been left empty for now.@kylebutts glad to get some help here. Looks like you've suggested a javascript package. I'm more familiar with python, so I'm going with that route for now.
Thanks, @m-nolan! Will follow up with you re. next steps.
And thanks too, @kylebutts. Through the feedgen
library, Python has decent support for writing RSS feeds, and GitHub Pages will allow us to deploy here, no extra infrastructure dependencies required. But I appreciate the heads-up re. Astro, which looks useful and will consider for other/future projects 👍
Update: PR #9 adds RSS feed functionality for the whole dataset. We've discussed splitting the feed into state/region-specific feeds, but I think that should be a separate issue. Can we close this one for now?
Update: PR #9 adds RSS feed functionality for the whole dataset. We've discussed splitting the feed into state/region-specific feeds, but I think that should be a separate issue. Can we close this one for now?
Here's a first try for per-state feeds: https://github.com/m-nolan/phmsa-hazmat-incident-reports/commit/6f756ba215f435a77433fa781b322b240a1801cd
Many thanks, @m-nolan. Merged, refactored, and now in main
! Really appreciate you taking on this issue. I think the feeds will be quite helpful.
You're welcome! Happy to help.
On Mon, Apr 3, 2023 at 8:58 AM Jeremy Singer-Vine @.***> wrote:
Many thanks, @m-nolan https://github.com/m-nolan. Merged, refactored, and now in main! Really appreciate you taking on this issue. I think the feeds will be quite helpful.
— Reply to this email directly, view it on GitHub https://github.com/data-liberation-project/phmsa-hazmat-incident-reports/issues/3#issuecomment-1494582454, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMVRSZCNETG2AZR32AJHSZDW7LXRJANCNFSM6AAAAAAVLJX7FE . You are receiving this because you were mentioned.Message ID: <data-liberation-project/phmsa-hazmat-incident-reports/issues/3/1494582454 @github.com>
-- Michael Nolan @.***
Unfortunately, the portal data does not indicate the time that the report was posted. However, there are a few ways to resolve this:
git
's history to identify when we first committed any given reportIdeally, we'd create a national RSS feed, plus one feed each for each state.