LibraryOfCongress / api.congress.gov

congress.gov API
642 stars 39 forks source link

Amendment Text Endpoint Returns Webpage Links Instead of Actual Text #206

Closed tomasen closed 3 months ago

tomasen commented 6 months ago

Description

When accessing the endpoint /amendment/{congress}/{amendmentType}/{amendmentNumber}/text in the api.congress.gov, it returns links to webpages containing the text versions of the amendment instead of the actual text content.

Expected Behavior

The endpoint should return the link to actual text of each amendment text version, allowing for direct access to the content without the need to navigate through webpages.

Steps to Reproduce

  1. Make a request to the endpoint /amendment/{congress}/{amendmentType}/{amendmentNumber}/text.
  2. Observe that the response contains links to webpages rather than the text content.
  3. all the links are the same despite they are in different formats. Here is an example response:
    {
    "pagination": {
    "count": 1
    },
    "request": {
    "amendmentNumber": "1690",
    "amendmentType": "samdt",
    "amendmentUrl": "https://api.congress.gov/v3/amendment/118/samdt/1690?format=json",
    "congress": "118",
    "contentType": "application/json",
    "format": "json"
    },
    "textVersions": [
    {
      "date": "2024-03-07T05:00:00Z",
      "formats": [
        {
          "type": "HTML",
          "url": "https://www.congress.gov/amendment/118th-congress/senate-amendment/1690/text/submitted/2261385"
        },
        {
          "type": "PDF",
          "url": "https://www.congress.gov/amendment/118th-congress/senate-amendment/1690/text/submitted/2261385"
        }
      ],
      "type": "Submitted"
    }
    ]
    }

Possible Solution

Modify the endpoint to extract and return the actual text content from the linked webpages, or provide an additional endpoint specifically for accessing the text content directly.

Additional Context

This issue affects users who require direct access to the amendment text for analysis or integration purposes, as it adds an extra step of having to navigate to and parse the content from the returned webpages.

rbram commented 6 months ago

Thank you for your feedback. We will create a ticket to address this.

tomasen commented 3 months ago

Hi, not sure if you noticed, in older records, there is 4 'w' in the domain instead of 3 'w'.

And still those still returns links to webpages containing the text versions of the amendment instead of the actual content.

Example Response:

 {
    "textVersions": [
        {
            "date": "2022-07-14T06:20:29Z",
            "formats": [
                {
                    "type": "PDF",
                    "url":"https://wwww.congress.gov/amendment/117th-congress/house-amendment/287/text/offered/2255810"
                },
                {
                    "type": "Formatted XML",
                    "url": "https://wwww.congress.gov/amendment/117th-congress/house-amendment/287/text/offered/2255810"
                }
            ],
            "type": "Offered"
        },
    ]
 }

And for newer records, it does return links to the CREC that may contain the amendment text, but it's still a little bit off from the text "for a specified amendment" as this endpoint descripted.

{
  "pagination": {
    "count": 1
  },
  "request": {
    "amendmentNumber": "111",
    "amendmentType": "hamdt",
    "amendmentUrl": "https://api.congress.gov/v3/amendment/118/hamdt/111?format=json",
    "congress": "118",
    "contentType": "application/json",
    "format": "json"
  },
  "textVersions": [
    {
      "date": "2023-03-23T20:30:09Z",
      "formats": [
        {
          "type": "HTML",
          "url": "https://www.congress.gov/118/crec/2023/03/23/169/53/modified/CREC-2023-03-23-pt1-PgH1348.htm"
        },
        {
          "type": "PDF",
          "url": "https://www.congress.gov/118/crec/2023/03/23/169/53/CREC-2023-03-23-pt1-PgH1348.pdf"
        }
      ],
      "type": "Offered"
    }
  ]
}
apreiter18 commented 3 months ago

@tomasen - Thank you for following up with these comments and questions. I believe that you are referring to the example on api.congress.gov. We are in the process of updating this example so that it shows the updated URLs . When actually performing a call, you will see that the URLs are updated.

For example, you reference 117 HAmdt 287 (https://api.congress.gov/v3/amendment/117/hamdt/287/text?api_key=INSERT_YOUR_KEY) and when I perform a call for this amendment I get the following: image

I also verified that amendments from pre-118th Congresses had the correct URLs such as:

  1. https://api.congress.gov/v3/amendment/111/hamdt/22/text?api_key=INSERT_KEY
  2. https://api.congress.gov/v3/amendment/114/samdt/899/text?api_key=INSERT_KEY
  3. https://api.congress.gov/v3/amendment/115/samdt/1245/text?api_key=INSERT_KEY

They are populating the correct URLs when performing calls. If you do notice any that are not correct we are always appreciative of users creating a Git issue for us to investigate.

Your comment here "but it's still a little bit off from the text "for a specified amendment" as this endpoint descripted" can be explained by the following in Amendment endpoint documentation: "Full text of Senate submitted amendments is displayed and searchable on Congress.gov for the 117th Congress forward. Links to text in the Congressional Record are provided for Senate amendments prior to the 117th Congress and for House amendments. See About the Congressional Record to learn more about searching the Congressional Record. Not all House amendments from the 117th Congress forward have text granules available at this time."

The Amendment text endpoint is consistent with other text level endpoints in the Congress.gov API, such as the Bill text endpoint. We are always looking for ways to analyze the feasibility of enhancement requests and greatly appreciate these comments. They inspire us to think of ways to improve the amendment text endpoint in the future. Thank you.

tomasen commented 3 months ago

Thanks for your clarifications. And the documents about congressional record helps.

While here might be another issue that I would like to bring to your attention. Here are the details:

For example: hamdt 996 to the 118th congress

When I make a request to the following API endpoint:

https://api.congress.gov/v3/amendment/118/hamdt/996/text?api_key=INSERT_KEY

I receive an empty array in the response:

{
  "pagination": {
    "count": 0
  },
  "request": {
    "amendmentNumber": "996",
    "amendmentType": "hamdt",
    "amendmentUrl": "https://api.congress.gov/v3/amendment/118/hamdt/996?format=json",
    "congress": "118",
    "contentType": "application/json",
    "format": "json"
  },
  "textVersions": []
}

That doesn't look right. Or is it?

apreiter18 commented 3 months ago

We provide this helpful information in the documentation for the Congress.gov amendment endpoint:

Note: Full text of Senate submitted amendments is displayed and searchable on Congress.gov for the 117th Congress forward. Links to text in the Congressional Record are provided for Senate amendments prior to the 117th Congress and for House amendments. See About the Congressional Record to learn more about searching the Congressional Record. Not all House amendments from the 117th Congress forward have text granules available at this time.

When you look at HAmdt 996, you will see the following: image

The Congress.gov API can only display information that is ingested and on Congress.gov, thus no information is displayed for this call. I hope this helps a bit. We are always looking for ways to improve our endpoints and documentation, so your feedback is appreciated. Thank you!