LibraryOfCongress / api.congress.gov

congress.gov API
642 stars 39 forks source link

Missing committeeMeetings data for sessions prior to 114th? #178

Closed jose-m-maciasiii closed 9 months ago

jose-m-maciasiii commented 10 months ago

Hello!

I am working on a research project where I am trying to bulk download hearing transcripts, my first step is to understand what is avliable and I using start by visiting https://api.congress.gov/v3/committee-meeting/113/house?api_key=THEAPIKEY&format=json&limit=250&offset=0

but it is returning:

{ "committeeMeetings": [], "pagination": { "count": 0 }, "request": { "chamber": "house", "congress": "113", "contentType": "application/json", "format": "json" } }

where as the 114th and newer return something like: { "committeeMeetings": [ { "chamber": "House", "congress": 114, "eventId": "104419", "updateDate": "2021-01-13 16:34:08+00:00", "url": "https://api.congress.gov/v3/committee-meeting/114/house/104419?format=json" }, { "chamber": "House", "congress": 114, "eventId": "104420", "updateDate": "2021-01-13 16:34:16+00:00", "url": "https://api.congress.gov/v3/committee-meeting/114/house/104420?format=json" }, { "chamber": "House", "congress": 114, "eventId": "104421", "updateDate": "2021-01-13 16:34:18+00:00", "url": "https://api.congress.gov/v3/committee-meeting/114/house/104421?format=json" }, { "chamber": "House", "congress": 114, "eventId": "104422", "updateDate": "2021-01-13 16:34:22+00:00", "url": "https://api.congress.gov/v3/committee-meeting/114/house/104422?format=json" }, { "chamber": "House", "congress": 114, "eventId": "103836", "updateDate": "2021-01-13 16:19:26+00:00", "url": "https://api.congress.gov/v3/committee-meeting/114/house/103836?format=json" },

and so on and so forth until the bottom:

     {
        "chamber": "House",
        "congress": 114,
        "eventId": "102976",
        "updateDate": "2021-01-13 16:02:01+00:00",
        "url": "https://api.congress.gov/v3/committee-meeting/114/house/102976?format=json"
    }
],
"pagination": {
    "count": 777,
    "next": "https://api.congress.gov/v3/committee-meeting/114/house?offset=250&limit=250&format=json"
},
"request": {
    "chamber": "house",
    "congress": "114",
    "contentType": "application/json",
    "format": "json"
}

}

I usually capture the infomation provided as starting points to find committee hearings and their transcripts. I would appreciate any help finding out why there is this issue with the 113th and older, or if theres a better way to getting transcript based off committee ID/systemID i am also open to trying.

Gmanicus commented 10 months ago

According to the collections coverage dates, committee meetings have unfortunately only been recorded since Congress 115 for the House and Congress 116 for the Senate.

Committee hearing transcripts themselves, though, appear to date back to Congress 103

jose-m-maciasiii commented 10 months ago

Hi @Gmanicus, Would you happen to know where to find endpoints that link committee information with the transcript? I pulled committee meetings as my starting point to grab the links for the url on the meetings themselves, query the meeting urls to then take the transcript urls from those to query in the item level data.

According to the collections coverage dates, committee meetings have unfortunately only been recorded since Congress 115 for the House and Congress 116 for the Senate.

Committee hearing transcripts themselves, though, appear to date back to Congress 103

Gmanicus commented 10 months ago

Hey Jose, it looks like you can work in reverse from the /hearings endpoint . If you drill down into the hearing details, it links to the associated committee, committee meeting, and the hearing transcript(s).