JustZack / audit-congress

A website collecting & displaying information about the activities of congress.
0 stars 0 forks source link

Supplement some bill information from congress api when bulk data isnt great #92

Open JustZack opened 5 months ago

JustZack commented 5 months ago

In the bulk bill pull not all bills (from 93->118) have the same data quality. Create means to decide when data needs to be fetched.

Text Versions are very inconsistent in quality. In bulk bill pull anything past congress 113 usually has an XML format. Via the congress.gov API I know there's usually an htm and pdf version at least since congress 93 aswell. In some cases all we know is that a text version exists, but NO data is provided.

Amendments range from only knowing the ID to having a full data model going into multiple tables. The congress.gov API has a dedicated amendment route that accepts the known ID.

For both of these, need means to track whether or not & when data was fetched from congress.gov API. Possibly also a cache invalidation time.

For text versions - treat anything NOT from congress.gov API as meta data. I.E. just to display a count before fetching the real data. For amendments - anything from congress 113 onwards is well-formed, but anything beforehand is just meta data to be used the same way as above.

Goal here is to avoid congress.gov API calls until the last possible moment, and to ensure the cached data is updated as needed