data-liberation-project / phmsa-hazmat-incident-reports

Data from decades of PHMSA's "5800.1" hazardous material transportation incident reports
https://www.data-liberation-project.org/datasets/phmsa-hazmat-incident-reports/
7 stars 3 forks source link

Generate RSS feeds with the latest available incidents #3

Closed jsvine closed 1 year ago

jsvine commented 1 year ago

Unfortunately, the portal data does not indicate the time that the report was posted. However, there are a few ways to resolve this:

Ideally, we'd create a national RSS feed, plus one feed each for each state.

jsvine commented 1 year ago

Some observations:

Given the above (and because we don't necessarily know that the INCIDENTID schema will remain usable forever), I think the git history approach might be best.

This seems to be working well here — https://github.com/data-liberation-project/fema-daily-ops-email-to-rss/blob/main/scripts/historify.py — but would need some tweaking to adjust for the fact that we'll be looking not for the time a file was committed, but rather all the entries that were committed to one of several files (i.e., the past ~6 month-files).

m-nolan commented 1 year ago

Looking at an example report - https://portal.phmsa.dot.gov/PDFGenerator/getPublicReport/OHMIR_5800-1?INCIDENTID=2129024 - is the "Date" field listed "PART VIII - CONTACT INFORMATION" the date the incident was reported by the contact listed there? If there's a way to get that out of the report doc then that might be an option. That said, it might not matter if you're only concerned about posts are new and should be published as a new entry in the rss feed.

jsvine commented 1 year ago

Very interesting! Although the PHMSA portal says that this version of the data includes "All fields included in Form 5800.1", that particular field appears to be missing. (It does, however, have other fields from that section, e.g., Contact Name, Contact Title, Contact Street, Contact City, Contact State, Contact Postal Code, Preparer Of Incident Report.)

It would theoretically be possible to grab those dates from the PDFs, but:

  1. Not all reports in the data are linked to a PDF. (I.e., same problem as above re. the INCIDENTID above.)
  2. It would add another processing step (not entirely opposed to that, but given the above, not sure it's worth it).
  3. We're not yet sure that this is what that date field represents (although it's plausible and could be confirmed with an email/call to PHMSA).
m-nolan commented 1 year ago

Sounds reasonable. I'll proceed with the git history approach.

kylebutts commented 1 year ago

Astro can generate RSS feeds that can be updated on git push (deploying on netlify): https://docs.astro.build/en/guides/rss/

You could do this for all 50 states + national easily too. Happy to help with that

m-nolan commented 1 year ago

I've gotten a start on this - I've copied the basic form @jsvine wrote here: https://github.com/data-liberation-project/fema-daily-ops-email-to-rss/blob/main/scripts/convert.py

I'm working on this in a version of the repo I forked over here: https://github.com/m-nolan/phmsa-hazmat-incident-reports

Several more steps are needed:

@kylebutts glad to get some help here. Looks like you've suggested a javascript package. I'm more familiar with python, so I'm going with that route for now.

jsvine commented 1 year ago

Thanks, @m-nolan! Will follow up with you re. next steps.

And thanks too, @kylebutts. Through the feedgen library, Python has decent support for writing RSS feeds, and GitHub Pages will allow us to deploy here, no extra infrastructure dependencies required. But I appreciate the heads-up re. Astro, which looks useful and will consider for other/future projects 👍

m-nolan commented 1 year ago

Update: PR #9 adds RSS feed functionality for the whole dataset. We've discussed splitting the feed into state/region-specific feeds, but I think that should be a separate issue. Can we close this one for now?

m-nolan commented 1 year ago

Update: PR #9 adds RSS feed functionality for the whole dataset. We've discussed splitting the feed into state/region-specific feeds, but I think that should be a separate issue. Can we close this one for now?

Here's a first try for per-state feeds: https://github.com/m-nolan/phmsa-hazmat-incident-reports/commit/6f756ba215f435a77433fa781b322b240a1801cd

jsvine commented 1 year ago

Many thanks, @m-nolan. Merged, refactored, and now in main! Really appreciate you taking on this issue. I think the feeds will be quite helpful.

m-nolan commented 1 year ago

You're welcome! Happy to help.

On Mon, Apr 3, 2023 at 8:58 AM Jeremy Singer-Vine @.***> wrote:

Many thanks, @m-nolan https://github.com/m-nolan. Merged, refactored, and now in main! Really appreciate you taking on this issue. I think the feeds will be quite helpful.

— Reply to this email directly, view it on GitHub https://github.com/data-liberation-project/phmsa-hazmat-incident-reports/issues/3#issuecomment-1494582454, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMVRSZCNETG2AZR32AJHSZDW7LXRJANCNFSM6AAAAAAVLJX7FE . You are receiving this because you were mentioned.Message ID: <data-liberation-project/phmsa-hazmat-incident-reports/issues/3/1494582454 @github.com>

-- Michael Nolan @.***