data-liberation-project / epa-rmp-coordination

A repository to track efforts to publish the EPA RMP database and associated documentation.
1 stars 0 forks source link

Get clarity on RMPSubmissionReasonCode usage #7

Closed jsvine closed 1 year ago

jsvine commented 1 year ago

The tblS1Facilities table contains a RMPSubmissionReasonCode field, with corresponding lookup table tlkpSubmissionReasonCodes. Looking at the frequency (# of submissions) of each code by year (see table below), a couple of questions in need of clarification:

Here's the table — note that you may have to scroll horizontally to see all columns:

Count of RMP submissions with each RMPSubmissionReasonCode by RecieptDate year.

Year (None) C01 C02 C03 C04 C05 C06 C07 C08 C09 R01 R02 R03 R04 R05 R06 R07 R08 R09
1999 16384 1 1 5
2000 2277 1 1 1 2
2001 1835 1 2 4 1 1
2002 2162 2 1 2 1 9
2003 1524 4 2 5 2 1 2 3 1
2004 739 248 143 167 61 16 108 260 11 52 119 11 33 40 61 22 9567 22 309
2005 321 140 127 144 156 57 302 21 1 5 47 12 43 50 43 11 1095 21 203
2006 352 85 110 210 314 47 335 14 4 2 35 14 35 74 46 23 626 21 273
2007 271 98 110 778 264 60 302 9 2 27 10 51 76 45 26 592 25 283
2008 354 78 98 214 185 52 314 6 4 6 23 9 55 80 61 33 637 21 373
2009 304 29 39 45 98 32 98 3 1 1 21 17 52 64 35 43 7234 31 439
2010 230 2 1 2 7 10 51 81 40 29 1391 42 373
2011 252 3 21 65 108 66 67 853 26 346
2012 285 3 18 71 118 66 40 842 43 443
2013 275 3 14 57 102 72 55 998 30 462
2014 326 5 17 62 104 58 64 5798 30 537
2015 281 4 16 59 138 40 52 1322 43 390
2016 248 3 18 54 153 63 315 1090 36 426
2017 264 4 25 63 101 66 48 1039 36 373
2018 245 2 24 45 100 51 36 1259 21 384
2019 222 8 17 62 124 51 38 4539 16 347
2020 187 6 15 42 97 44 34 1454 23 348
2021 157 5 14 38 83 29 266 1306 23 289
2022 25 2 5 10 10 3 211 4 54
jsvine commented 1 year ago

Ah, the blank reasons are less of a mystery — they almost all either:

jsvine commented 1 year ago

Ah, and some more clarity: SubmissionType=C submissions disappear entirely after 2009. So it makes sense that the C reasons would also disappear. But now the question is why C-type submissions were deprecated. Asking the EPA about this.

Year F R C
1999 14308 283 1846
2000 761 451 1041
2001 323 542 988
2002 330 831 1010
2003 334 747 471
2004 471 10335 1174
2005 324 1527 956
2006 350 1140 1116
2007 271 1132 1624
2008 355 1298 953
2009 305 7930 350
2010 224 2022
2011 252 1555
2012 285 1644
2013 275 1793
2014 322 6679
2015 281 2064
2016 247 2159
2017 264 1755
2018 245 1922
2019 222 5202
2020 187 2063
2021 157 2053
2022 25 299

(Note: Year is based on PostmarkDate, or ReceiptDate for a handful of the submissions that are missing the former.)

jsvine commented 1 year ago

Getting closer to a clearer picture! Via email, the EPA provides an explanation that lines up neatly with what we're seeing in the data:

RMPeSubmit started in 2009. Before that RMPSubmit 2004 was used. First time submissions, corrections, and resubmissions were all submitted on diskettes or CDs so each one was submitted independently. Now with RMP*eSubmit a correction is not a full submission and is a change made to an already existing first time submission or resubmission.

I've now added a section related to that in the documentation draft: "Understanding corrections to RMP submissions"

There still remains, at least for me, some aspects of this that require clarification. For instance: How are pre-2009 corrections reflected in the _ChangeHistory tables? And how is it that some pre-2009 changes in the _ChangeHistory tables do not appear to be accompanied by a correction-type RMP submission?

jsvine commented 1 year ago

I have received some helpful clarification from the EPA, and updated the relevant section of the documentation to reflect that. The short version is: Even in the earlier era, submitters could make minor corrections online (although only to Section 1 of the RMP) via a utility called "WebRC". This does appear to explain the phenomenon re. pre-2009 changes raised in the comment directly above.