LibraryOfCongress / api.congress.gov

congress.gov API
601 stars 38 forks source link

(committee-report list): duplicates with differing `updateDate` and `citation` #216

Open ryparker opened 3 months ago

ryparker commented 3 months ago

The committee-report list response has duplicates where only updateDate has changed. Some of the duplicates have a slightly different citation.

Request committee-reports list with offset of 40.

curl --location 'https://api.congress.gov/v3/committee-report/118?format=json&offset=40&api_key=<API_KEY>'

Response:

{
    "pagination": {
        "count": 657,
        "next": "https://api.congress.gov/v3/committee-report/118?offset=60&limit=20&format=json",
        "prev": "https://api.congress.gov/v3/committee-report/118?offset=20&limit=20&format=json"
    },
    "reports": [
        {
            "chamber": "House",
            "citation": "H. Rept. 118-402",
            "congress": 118,
            "number": 402,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-03-04 23:41:14+00:00",
            "url": "https://api.congress.gov/v3/committee-report/118/HRPT/402?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 118-401",
            "congress": 118,
            "number": 401,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-03-18 22:19:32+00:00",
            "url": "https://api.congress.gov/v3/committee-report/118/HRPT/401?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 118-401",
            "congress": 118,
            "number": 401,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-03-11 21:26:15+00:00",
            "url": "https://api.congress.gov/v3/committee-report/118/HRPT/401?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 118-400,Part 1",
            "congress": 118,
            "number": 400,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-02-26 14:41:20+00:00",
            "url": "https://api.congress.gov/v3/committee-report/118/HRPT/400?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 118-400",
            "congress": 118,
            "number": 400,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-03-18 22:19:21+00:00",
            "url": "https://api.congress.gov/v3/committee-report/118/HRPT/400?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 118-400",
            "congress": 118,
            "number": 400,
            "part": 2,
            "type": "HRPT",
            "updateDate": "2024-03-18 22:19:21+00:00",
            "url": "https://api.congress.gov/v3/committee-report/118/HRPT/400?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 118-400,Part 2",
            "congress": 118,
            "number": 400,
            "part": 2,
            "type": "HRPT",
            "updateDate": "2024-02-26 14:41:20+00:00",
            "url": "https://api.congress.gov/v3/committee-report/118/HRPT/400?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 118-399",
            "congress": 118,
            "number": 399,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-03-02 19:26:17+00:00",
            "url": "https://api.congress.gov/v3/committee-report/118/HRPT/399?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 118-398",
            "congress": 118,
            "number": 398,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-03-02 18:56:20+00:00",
            "url": "https://api.congress.gov/v3/committee-report/118/HRPT/398?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 118-397",
            "congress": 118,
            "number": 397,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-03-18 22:19:22+00:00",
            "url": "https://api.congress.gov/v3/committee-report/118/HRPT/397?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 118-397",
            "congress": 118,
            "number": 397,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-03-02 21:26:18+00:00",
            "url": "https://api.congress.gov/v3/committee-report/118/HRPT/397?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 118-396",
            "congress": 118,
            "number": 396,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-03-02 21:26:18+00:00",
            "url": "https://api.congress.gov/v3/committee-report/118/HRPT/396?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 118-395",
            "congress": 118,
            "number": 395,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-02-29 00:26:19+00:00",
            "url": "https://api.congress.gov/v3/committee-report/118/HRPT/395?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 118-395",
            "congress": 118,
            "number": 395,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-03-18 22:18:58+00:00",
            "url": "https://api.congress.gov/v3/committee-report/118/HRPT/395?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 118-394",
            "congress": 118,
            "number": 394,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-02-24 22:56:19+00:00",
            "url": "https://api.congress.gov/v3/committee-report/118/HRPT/394?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 118-393",
            "congress": 118,
            "number": 393,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-03-05 01:56:16+00:00",
            "url": "https://api.congress.gov/v3/committee-report/118/HRPT/393?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 118-392",
            "congress": 118,
            "number": 392,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-02-27 18:56:19+00:00",
            "url": "https://api.congress.gov/v3/committee-report/118/HRPT/392?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 118-392",
            "congress": 118,
            "number": 392,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-03-18 22:18:23+00:00",
            "url": "https://api.congress.gov/v3/committee-report/118/HRPT/392?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 118-391",
            "congress": 118,
            "number": 391,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-02-28 23:26:18+00:00",
            "url": "https://api.congress.gov/v3/committee-report/118/HRPT/391?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 118-391",
            "congress": 118,
            "number": 391,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-03-18 22:18:16+00:00",
            "url": "https://api.congress.gov/v3/committee-report/118/HRPT/391?format=json"
        }
    ],
    "request": {
        "congress": "118",
        "contentType": "application/json",
        "format": "json"
    }
}

Notice two sets of duplicates where only the updateDate and citation name has changed. (e.g. H. Rept. 118-400 vs H. Rept. 118-400,Part 1)

{
    "chamber": "House",
    "citation": "H. Rept. 118-400,Part 1",
    "congress": 118,
    "number": 400,
    "part": 1,
    "type": "HRPT",
    "updateDate": "2024-02-26 14:41:20+00:00",
    "url": "https://api.congress.gov/v3/committee-report/118/HRPT/400?format=json"
  },
  {
    "chamber": "House",
    "citation": "H. Rept. 118-400",
    "congress": 118,
    "number": 400,
    "part": 2,
    "type": "HRPT",
    "updateDate": "2024-03-18 22:19:21+00:00",
    "url": "https://api.congress.gov/v3/committee-report/118/HRPT/400?format=json"
  },
  {
    "chamber": "House",
    "citation": "H. Rept. 118-400",
    "congress": 118,
    "number": 400,
    "part": 1,
    "type": "HRPT",
    "updateDate": "2024-03-18 22:19:21+00:00",
    "url": "https://api.congress.gov/v3/committee-report/118/HRPT/400?format=json"
  },
  {
    "chamber": "House",
    "citation": "H. Rept. 118-400,Part 2",
    "congress": 118,
    "number": 400,
    "part": 2,
    "type": "HRPT",
    "updateDate": "2024-02-26 14:41:20+00:00",
    "url": "https://api.congress.gov/v3/committee-report/118/HRPT/400?format=json"
  },

Another instance:

curl --location 'https://api.congress.gov/v3/committee-report/118?format=json&limit=100&api_key=<API_KEY>'`

Notice these H. Rept. 118-371 duplicates have the same citation but differing updateDate.

{
            "chamber": "House",
            "citation": "H. Rept. 118-371",
            "congress": 118,
            "number": 371,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-02-14 02:41:21+00:00",
            "url": "https://api.congress.gov/v3/committee-report/118/HRPT/371?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 118-371",
            "congress": 118,
            "number": 371,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-03-18 22:12:21+00:00",
            "url": "https://api.congress.gov/v3/committee-report/118/HRPT/371?format=json"
        },
ryparker commented 3 months ago

Sorry for the back to back issues. I've recently improved some validation logic in my work and it's helped identify some interesting issues.

apreiter18 commented 3 months ago

possibly (currently investigating) related to issue #211

ryparker commented 3 months ago

Here's a list of all the duplicates that i've caught in the 118th congress:

- hrpt-416
- hrpt-414
- hrpt-413
- hrpt-412
- hrpt-411
- hrpt-410
- hrpt-409
- hrpt-408
- hrpt-405
- hrpt-404
- hrpt-403
- hrpt-402
- hrpt-401
- hrpt-400
- hrpt-400
- hrpt-397
- hrpt-395
- hrpt-392
- hrpt-391
- hrpt-390
- hrpt-389
- hrpt-387
- hrpt-386
- hrpt-385
- hrpt-384
- hrpt-382
- hrpt-381
- hrpt-380
- hrpt-379
- hrpt-378
- hrpt-377
- hrpt-376
- hrpt-372
- hrpt-372
- hrpt-372
- hrpt-372
- hrpt-371
- hrpt-370
- hrpt-369
- hrpt-368
- hrpt-367
- hrpt-366
- hrpt-365
- hrpt-364
- hrpt-363
- hrpt-361
- hrpt-360
- hrpt-359
- hrpt-358
- hrpt-356
- hrpt-355
- hrpt-354
- hrpt-352
- hrpt-351
- hrpt-348
- hrpt-347
- hrpt-343
- hrpt-341
- hrpt-340
- hrpt-339
- hrpt-168
- hrpt-167
apreiter18 commented 3 days ago

Hi @ryparker - can you confirm if you are no longer seeing these duplicates? Thanks!

ryparker commented 2 days ago

Looks like all the previously mentioned reports for the 118th are fixed. However I caught a few other duplicates on past congresses:

Ids are in the format {type}{number}-{part} e.g. srpt39-1 is type: srpt, number: 39, part: 1

117

 srpt39-1

112

hrpt32-2

105

srpt167-5
srpt167-4

103

hrpt615-2
hrpt140-1

101

hrpt726-2
hrpt485-2
hrpt485-3
hrpt485-4
hrpt241-2

98

hrpt645-1
apreiter18 commented 2 days ago

@ryparker - Would you mind providing links to any calls where you see duplicates? I do not see duplicates in our database. What I do notice about these is that one report has Errata and other reports have parts. Thank you!

ryparker commented 2 days ago

Here's a curl for the 117th duplicate. (offset: 934)

curl --location 'https://api.congress.gov/v3/committee-report/117?format=json&offset=934&api_key=<API_KEY>'

Response (see "citation": "S. Rept. 117-39", "citation": "S. Rept. 117-39,Errata")

{
    "pagination": {
        "count": 1019,
        "next": "https://api.congress.gov/v3/committee-report/117?offset=954&limit=20&format=json",
        "prev": "https://api.congress.gov/v3/committee-report/117?offset=914&limit=20&format=json"
    },
    "reports": [
        {
            "chamber": "House",
            "citation": "H. Rept. 117-39",
            "congress": 117,
            "number": 39,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-04-17 23:43:14+00:00",
            "url": "https://api.congress.gov/v3/committee-report/117/HRPT/39?format=json"
        },
        {
            "chamber": "Senate",
            "citation": "S. Rept. 117-39",
            "congress": 117,
            "number": 39,
            "part": 1,
            "type": "SRPT",
            "updateDate": "2024-04-17 23:43:36+00:00",
            "url": "https://api.congress.gov/v3/committee-report/117/SRPT/39?format=json"
        },
        {
            "chamber": "Senate",
            "citation": "S. Rept. 117-39,Errata",
            "congress": 117,
            "number": 39,
            "part": 1,
            "type": "SRPT",
            "updateDate": "2024-04-17 23:43:36+00:00",
            "url": "https://api.congress.gov/v3/committee-report/117/SRPT/39?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 117-38",
            "congress": 117,
            "number": 38,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-04-17 23:43:13+00:00",
            "url": "https://api.congress.gov/v3/committee-report/117/HRPT/38?format=json"
        },
        {
            "chamber": "Senate",
            "citation": "S. Rept. 117-38",
            "congress": 117,
            "number": 38,
            "part": 1,
            "type": "SRPT",
            "updateDate": "2024-04-17 23:43:36+00:00",
            "url": "https://api.congress.gov/v3/committee-report/117/SRPT/38?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 117-37",
            "congress": 117,
            "number": 37,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-04-17 23:43:13+00:00",
            "url": "https://api.congress.gov/v3/committee-report/117/HRPT/37?format=json"
        },
        {
            "chamber": "Senate",
            "citation": "S. Rept. 117-37",
            "congress": 117,
            "number": 37,
            "part": 1,
            "type": "SRPT",
            "updateDate": "2024-04-17 23:43:35+00:00",
            "url": "https://api.congress.gov/v3/committee-report/117/SRPT/37?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 117-36",
            "congress": 117,
            "number": 36,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-04-17 23:43:13+00:00",
            "url": "https://api.congress.gov/v3/committee-report/117/HRPT/36?format=json"
        },
        {
            "chamber": "Senate",
            "citation": "S. Rept. 117-36",
            "congress": 117,
            "number": 36,
            "part": 1,
            "type": "SRPT",
            "updateDate": "2024-04-17 23:43:35+00:00",
            "url": "https://api.congress.gov/v3/committee-report/117/SRPT/36?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 117-35",
            "congress": 117,
            "number": 35,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-04-17 23:43:12+00:00",
            "url": "https://api.congress.gov/v3/committee-report/117/HRPT/35?format=json"
        },
        {
            "chamber": "Senate",
            "citation": "S. Rept. 117-35",
            "congress": 117,
            "number": 35,
            "part": 1,
            "type": "SRPT",
            "updateDate": "2024-04-17 23:43:35+00:00",
            "url": "https://api.congress.gov/v3/committee-report/117/SRPT/35?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 117-34",
            "congress": 117,
            "number": 34,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-04-17 23:43:12+00:00",
            "url": "https://api.congress.gov/v3/committee-report/117/HRPT/34?format=json"
        },
        {
            "chamber": "Senate",
            "citation": "S. Rept. 117-34",
            "congress": 117,
            "number": 34,
            "part": 1,
            "type": "SRPT",
            "updateDate": "2024-04-17 23:43:35+00:00",
            "url": "https://api.congress.gov/v3/committee-report/117/SRPT/34?format=json"
        },
        {
            "chamber": "Senate",
            "citation": "S. Rept. 117-33",
            "congress": 117,
            "number": 33,
            "part": 1,
            "type": "SRPT",
            "updateDate": "2024-04-17 23:43:35+00:00",
            "url": "https://api.congress.gov/v3/committee-report/117/SRPT/33?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 117-33",
            "congress": 117,
            "number": 33,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-04-17 23:43:11+00:00",
            "url": "https://api.congress.gov/v3/committee-report/117/HRPT/33?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 117-32",
            "congress": 117,
            "number": 32,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-04-17 23:43:10+00:00",
            "url": "https://api.congress.gov/v3/committee-report/117/HRPT/32?format=json"
        },
        {
            "chamber": "Senate",
            "citation": "S. Rept. 117-32",
            "congress": 117,
            "number": 32,
            "part": 1,
            "type": "SRPT",
            "updateDate": "2024-04-17 23:43:35+00:00",
            "url": "https://api.congress.gov/v3/committee-report/117/SRPT/32?format=json"
        },
        {
            "chamber": "Senate",
            "citation": "S. Rept. 117-31",
            "congress": 117,
            "number": 31,
            "part": 1,
            "type": "SRPT",
            "updateDate": "2024-04-17 23:43:35+00:00",
            "url": "https://api.congress.gov/v3/committee-report/117/SRPT/31?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 117-31",
            "congress": 117,
            "number": 31,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-04-17 23:43:09+00:00",
            "url": "https://api.congress.gov/v3/committee-report/117/HRPT/31?format=json"
        },
        {
            "chamber": "House",
            "citation": "H. Rept. 117-30",
            "congress": 117,
            "number": 30,
            "part": 1,
            "type": "HRPT",
            "updateDate": "2024-04-17 23:43:08+00:00",
            "url": "https://api.congress.gov/v3/committee-report/117/HRPT/30?format=json"
        }
    ],
    "request": {
        "congress": "117",
        "contentType": "application/json",
        "format": "json"
    }
}

It does seem like it's related to citations that end with "Errata". I haven't rechecked all the others but i'll update this comment once I can confirm its related to the "Errata" citation endings.

Because the Errata props are identical to their non-errata duplicates, this seems like a bug.

apreiter18 commented 2 days ago

Errata are lists of errors in congressional publications. The corrections are printed on sheets, or pages. The errata sheets are usually tipped into the original document.

Congress.gov provides errata text on a tab within committee report texts (e.g., errata issued for 117 SRept 39) so that is why you are seeing these two instances.

I hope this helps!

ryparker commented 2 days ago

That makes sense, however from an API perspective this seems a bit odd. If it's a correction of SRPT 39 then It seems like something that should be part of the existing SRPT 39 list item and not separated as its own entity.

Given the list API provides the same props for the two SRPT 39 entries, they lead the dev to the same details API.

What's also odd is that when requesting the details API for SRPT 39 it seems to only contain data for the Errata, where I would have expected both the original SRPT 39 (pre-errata) and the Errata.

{
    "committeeReports": [
        {
            "associatedBill": [
                {
                    "congress": 117,
                    "number": "2792",
                    "type": "S",
                    "url": "https://api.congress.gov/v3/bill/117/s/2792?format=json"
                }
            ],
            "chamber": "Senate",
            "citation": "S. Rept. 117-39,Errata",
            "committees": [],
            "congress": 117,
            "isConferenceReport": false,
            "issueDate": null,
            "number": 39,
            "part": 1,
            "reportType": "S.Rept.",
            "sessionNumber": 1,
            "text": {
                "count": 6,
                "url": "https://api.congress.gov/v3/committee-report/117/srpt/39/text?format=json"
            },
            "title": null,
            "type": "SRPT",
            "updateDate": "2024-04-17T23:43:36Z"
        }
    ],
    "request": {
        "congress": "117",
        "contentType": "application/json",
        "format": "json",
        "reportNumber": "39",
        "reportType": "srpt"
    }
}