Open ryparker opened 7 months ago
Sorry for the back to back issues. I've recently improved some validation logic in my work and it's helped identify some interesting issues.
possibly (currently investigating) related to issue #211
Here's a list of all the duplicates that i've caught in the 118th congress:
- hrpt-416
- hrpt-414
- hrpt-413
- hrpt-412
- hrpt-411
- hrpt-410
- hrpt-409
- hrpt-408
- hrpt-405
- hrpt-404
- hrpt-403
- hrpt-402
- hrpt-401
- hrpt-400
- hrpt-400
- hrpt-397
- hrpt-395
- hrpt-392
- hrpt-391
- hrpt-390
- hrpt-389
- hrpt-387
- hrpt-386
- hrpt-385
- hrpt-384
- hrpt-382
- hrpt-381
- hrpt-380
- hrpt-379
- hrpt-378
- hrpt-377
- hrpt-376
- hrpt-372
- hrpt-372
- hrpt-372
- hrpt-372
- hrpt-371
- hrpt-370
- hrpt-369
- hrpt-368
- hrpt-367
- hrpt-366
- hrpt-365
- hrpt-364
- hrpt-363
- hrpt-361
- hrpt-360
- hrpt-359
- hrpt-358
- hrpt-356
- hrpt-355
- hrpt-354
- hrpt-352
- hrpt-351
- hrpt-348
- hrpt-347
- hrpt-343
- hrpt-341
- hrpt-340
- hrpt-339
- hrpt-168
- hrpt-167
Hi @ryparker - can you confirm if you are no longer seeing these duplicates? Thanks!
Looks like all the previously mentioned reports for the 118th are fixed. However I caught a few other duplicates on past congresses:
Ids are in the format
{type}{number}-{part}
e.g. srpt39-1 is type:srpt
, number:39
, part:1
117
srpt39-1
112
hrpt32-2
105
srpt167-5
srpt167-4
103
hrpt615-2
hrpt140-1
101
hrpt726-2
hrpt485-2
hrpt485-3
hrpt485-4
hrpt241-2
98
hrpt645-1
@ryparker - Would you mind providing links to any calls where you see duplicates? I do not see duplicates in our database. What I do notice about these is that one report has Errata and other reports have parts. Thank you!
Here's a curl for the 117th duplicate. (offset: 934)
curl --location 'https://api.congress.gov/v3/committee-report/117?format=json&offset=934&api_key=<API_KEY>'
Response (see "citation": "S. Rept. 117-39"
, "citation": "S. Rept. 117-39,Errata"
)
{
"pagination": {
"count": 1019,
"next": "https://api.congress.gov/v3/committee-report/117?offset=954&limit=20&format=json",
"prev": "https://api.congress.gov/v3/committee-report/117?offset=914&limit=20&format=json"
},
"reports": [
{
"chamber": "House",
"citation": "H. Rept. 117-39",
"congress": 117,
"number": 39,
"part": 1,
"type": "HRPT",
"updateDate": "2024-04-17 23:43:14+00:00",
"url": "https://api.congress.gov/v3/committee-report/117/HRPT/39?format=json"
},
{
"chamber": "Senate",
"citation": "S. Rept. 117-39",
"congress": 117,
"number": 39,
"part": 1,
"type": "SRPT",
"updateDate": "2024-04-17 23:43:36+00:00",
"url": "https://api.congress.gov/v3/committee-report/117/SRPT/39?format=json"
},
{
"chamber": "Senate",
"citation": "S. Rept. 117-39,Errata",
"congress": 117,
"number": 39,
"part": 1,
"type": "SRPT",
"updateDate": "2024-04-17 23:43:36+00:00",
"url": "https://api.congress.gov/v3/committee-report/117/SRPT/39?format=json"
},
{
"chamber": "House",
"citation": "H. Rept. 117-38",
"congress": 117,
"number": 38,
"part": 1,
"type": "HRPT",
"updateDate": "2024-04-17 23:43:13+00:00",
"url": "https://api.congress.gov/v3/committee-report/117/HRPT/38?format=json"
},
{
"chamber": "Senate",
"citation": "S. Rept. 117-38",
"congress": 117,
"number": 38,
"part": 1,
"type": "SRPT",
"updateDate": "2024-04-17 23:43:36+00:00",
"url": "https://api.congress.gov/v3/committee-report/117/SRPT/38?format=json"
},
{
"chamber": "House",
"citation": "H. Rept. 117-37",
"congress": 117,
"number": 37,
"part": 1,
"type": "HRPT",
"updateDate": "2024-04-17 23:43:13+00:00",
"url": "https://api.congress.gov/v3/committee-report/117/HRPT/37?format=json"
},
{
"chamber": "Senate",
"citation": "S. Rept. 117-37",
"congress": 117,
"number": 37,
"part": 1,
"type": "SRPT",
"updateDate": "2024-04-17 23:43:35+00:00",
"url": "https://api.congress.gov/v3/committee-report/117/SRPT/37?format=json"
},
{
"chamber": "House",
"citation": "H. Rept. 117-36",
"congress": 117,
"number": 36,
"part": 1,
"type": "HRPT",
"updateDate": "2024-04-17 23:43:13+00:00",
"url": "https://api.congress.gov/v3/committee-report/117/HRPT/36?format=json"
},
{
"chamber": "Senate",
"citation": "S. Rept. 117-36",
"congress": 117,
"number": 36,
"part": 1,
"type": "SRPT",
"updateDate": "2024-04-17 23:43:35+00:00",
"url": "https://api.congress.gov/v3/committee-report/117/SRPT/36?format=json"
},
{
"chamber": "House",
"citation": "H. Rept. 117-35",
"congress": 117,
"number": 35,
"part": 1,
"type": "HRPT",
"updateDate": "2024-04-17 23:43:12+00:00",
"url": "https://api.congress.gov/v3/committee-report/117/HRPT/35?format=json"
},
{
"chamber": "Senate",
"citation": "S. Rept. 117-35",
"congress": 117,
"number": 35,
"part": 1,
"type": "SRPT",
"updateDate": "2024-04-17 23:43:35+00:00",
"url": "https://api.congress.gov/v3/committee-report/117/SRPT/35?format=json"
},
{
"chamber": "House",
"citation": "H. Rept. 117-34",
"congress": 117,
"number": 34,
"part": 1,
"type": "HRPT",
"updateDate": "2024-04-17 23:43:12+00:00",
"url": "https://api.congress.gov/v3/committee-report/117/HRPT/34?format=json"
},
{
"chamber": "Senate",
"citation": "S. Rept. 117-34",
"congress": 117,
"number": 34,
"part": 1,
"type": "SRPT",
"updateDate": "2024-04-17 23:43:35+00:00",
"url": "https://api.congress.gov/v3/committee-report/117/SRPT/34?format=json"
},
{
"chamber": "Senate",
"citation": "S. Rept. 117-33",
"congress": 117,
"number": 33,
"part": 1,
"type": "SRPT",
"updateDate": "2024-04-17 23:43:35+00:00",
"url": "https://api.congress.gov/v3/committee-report/117/SRPT/33?format=json"
},
{
"chamber": "House",
"citation": "H. Rept. 117-33",
"congress": 117,
"number": 33,
"part": 1,
"type": "HRPT",
"updateDate": "2024-04-17 23:43:11+00:00",
"url": "https://api.congress.gov/v3/committee-report/117/HRPT/33?format=json"
},
{
"chamber": "House",
"citation": "H. Rept. 117-32",
"congress": 117,
"number": 32,
"part": 1,
"type": "HRPT",
"updateDate": "2024-04-17 23:43:10+00:00",
"url": "https://api.congress.gov/v3/committee-report/117/HRPT/32?format=json"
},
{
"chamber": "Senate",
"citation": "S. Rept. 117-32",
"congress": 117,
"number": 32,
"part": 1,
"type": "SRPT",
"updateDate": "2024-04-17 23:43:35+00:00",
"url": "https://api.congress.gov/v3/committee-report/117/SRPT/32?format=json"
},
{
"chamber": "Senate",
"citation": "S. Rept. 117-31",
"congress": 117,
"number": 31,
"part": 1,
"type": "SRPT",
"updateDate": "2024-04-17 23:43:35+00:00",
"url": "https://api.congress.gov/v3/committee-report/117/SRPT/31?format=json"
},
{
"chamber": "House",
"citation": "H. Rept. 117-31",
"congress": 117,
"number": 31,
"part": 1,
"type": "HRPT",
"updateDate": "2024-04-17 23:43:09+00:00",
"url": "https://api.congress.gov/v3/committee-report/117/HRPT/31?format=json"
},
{
"chamber": "House",
"citation": "H. Rept. 117-30",
"congress": 117,
"number": 30,
"part": 1,
"type": "HRPT",
"updateDate": "2024-04-17 23:43:08+00:00",
"url": "https://api.congress.gov/v3/committee-report/117/HRPT/30?format=json"
}
],
"request": {
"congress": "117",
"contentType": "application/json",
"format": "json"
}
}
It does seem like it's related to citations that end with "Errata". I haven't rechecked all the others but i'll update this comment once I can confirm its related to the "Errata" citation endings.
Because the Errata props are identical to their non-errata duplicates, this seems like a bug.
Errata are lists of errors in congressional publications. The corrections are printed on sheets, or pages. The errata sheets are usually tipped into the original document.
Congress.gov provides errata text on a tab within committee report texts (e.g., errata issued for 117 SRept 39) so that is why you are seeing these two instances.
I hope this helps!
That makes sense, however from an API perspective this seems a bit odd. If it's a correction of SRPT 39 then It seems like something that should be part of the existing SRPT 39 list item and not separated as its own entity.
Given the list API provides the same props for the two SRPT 39 entries, they lead the dev to the same details API.
What's also odd is that when requesting the details API for SRPT 39 it seems to only contain data for the Errata, where I would have expected both the original SRPT 39 (pre-errata) and the Errata.
{
"committeeReports": [
{
"associatedBill": [
{
"congress": 117,
"number": "2792",
"type": "S",
"url": "https://api.congress.gov/v3/bill/117/s/2792?format=json"
}
],
"chamber": "Senate",
"citation": "S. Rept. 117-39,Errata",
"committees": [],
"congress": 117,
"isConferenceReport": false,
"issueDate": null,
"number": 39,
"part": 1,
"reportType": "S.Rept.",
"sessionNumber": 1,
"text": {
"count": 6,
"url": "https://api.congress.gov/v3/committee-report/117/srpt/39/text?format=json"
},
"title": null,
"type": "SRPT",
"updateDate": "2024-04-17T23:43:36Z"
}
],
"request": {
"congress": "117",
"contentType": "application/json",
"format": "json",
"reportNumber": "39",
"reportType": "srpt"
}
}
The committee-report list response has duplicates where only
updateDate
has changed. Some of the duplicates have a slightly differentcitation
.Request committee-reports list with offset of 40.
Response:
Notice two sets of duplicates where only the
updateDate
andcitation
name has changed. (e.g.H. Rept. 118-400
vsH. Rept. 118-400,Part 1
)Another instance:
Notice these H. Rept. 118-371 duplicates have the same
citation
but differingupdateDate
.