data-liberation-project / phmsa-hazmat-incident-reports

Data from decades of PHMSA's "5800.1" hazardous material transportation incident reports
https://www.data-liberation-project.org/datasets/phmsa-hazmat-incident-reports/
7 stars 3 forks source link

Figure out how to identify incident *updates* #4

Open jsvine opened 1 year ago

jsvine commented 1 year ago

The regulations concerning updates can be found here. How are those updates reflected in the data? Is it possible to identify them precisely?

gcappaert commented 1 year ago

Working on this. Do you know of any incidents that for sure did get updated? I've tried searching for different reports with the same incident date and so far, haven't found any updates, just what look like duplicate reports.

jsvine commented 1 year ago

@gcappaert Thanks! That's a great question, and one I don't know the answer to. It's possible that, rather than reflecting the update in a second entry, PHMSA updates those incidents "in place".

One possible way to detect this would be to compare a recent version of the data in this repo (e.g., as of the most recent commit) to earliest version stored in the repository. Do you see any incidents whose attributes in the data changed between then and now?

gcappaert commented 1 year ago

Doing a comparison, I don't see any incidents that changed attributes (https://github.com/data-liberation-project/phmsa-hazmat-incident-reports/compare/fd376e3..b0ac3ef), just a lot of additions and deletions of the same data in different places. There's probably a way to do a smarter diff that actually looks for small differences between the same entries, but I don't know how to do it.

This seems like one where it actually might be quickest to call DOT and ask them how often reports get updated / where that's reflected.

jsvine commented 1 year ago

Yes, I think that's right. I did a quick check, and most of the intra-report changes between commits relate to contact information, which might even be something that PHMSA updates through a separate process from the main incident-update process. In the random-ish test I ran, I also found some changes to Identification Markings and things like that, but not many major-major revisions. Will keep an eye on this, and plan to reach out to PHMSA for more details.