data-liberation-project / epa-rmp-coordination

A repository to track efforts to publish the EPA RMP database and associated documentation.
1 stars 0 forks source link

Get confirmation/clarification re. "FacilityID" column #2

Closed jsvine closed 1 year ago

jsvine commented 1 year ago

See this note from the documentation draft:

The most important table in the RMP data appears to be RMPData:tblS1Facilities. The name appears to be a misnomer, as each row represents an RMP submission, not a facility. The two essential IDs appear to be these:

  • FacilityID: This also appears to be a misnomer, and provides an ID for each submission rather than facility. In tblS1Facilities, all of this column’s values are unique. Using this ID, you can connect each submission to the following tables in the RMPData file: tblS1Processes, tblS6AccidentHistory, tblS9EmergencyResponses, and tblExecutiveSummaries. In fact, it appears to be the only ID that allows for such joining.

  • EPAFacilityID: This appears to be the “real” facility ID. It is joinable to the column of the same name in the RMPFac:tblFacility table, which provides facility-level (rather than RMP-submission-level) information.

I need to contact the EPA about this.

jsvine commented 1 year ago

I've now emailed the person who seems to oversee the RMP data for the EPA, will update here when/if I receive a response 👍

jsvine commented 1 year ago

Still waiting to hear from the EPA, but realized that we already have some confirmation in the form of Appendix F of the RMP*Review User’s Guide, which includes several helpful relationship diagrams demonstrating the use of FacilityID as the main key joining tblS1Facilities to the rest of the data. E.g.:

Figure 2

I've updated the documentation draft noting that and pointing folks more prominently to that appendix.

jsvine commented 1 year ago

Good news 🎉 : We have clarity and confirmation from the EPA on this. Via email:

The assumption that FacilityID field under tblS1Facilities is used more like a RMP Submission Identifier is correct. The field is a random number assigned to each submission and is the same value as the “Plan Sequence Number” found in RMP Info Query results. [...] EPAFacilityID is the “true” unique ID for facilities.

EPA also confirms that the description for FacilityID in _tlkpRSFieldDesc is out of date and incorrect.

I've now updated the documentation draft and data dictionary to reflect this confirmation.