fecgov / openFEC

The first RESTful API for the Federal Election Commission. We're aiming to make campaign finance more accessible for journalists, academics, developers, and other transparency seekers.
https://api.open.fec.gov/developers
Other
481 stars 106 forks source link

Consider increasing cache length for non-schedule data #4745

Closed lbeaufort closed 3 years ago

lbeaufort commented 3 years ago

What we're after: We should consider increasing cache length for non-schedule data (data that is refreshed once per day with MV's). One disadvantage is we'd need to bust the cache in the (rare) cases where we need to manually remove offensive or incorrect data and have it appear immediately. In these cases, we don't always refresh the MV's and sometimes wait for them to show up the next day.

https://github.com/fecgov/openFEC/blob/251a49efda44ea2c8a05f287c47b9e25f83dda9c/webservices/rest.py#L182-L212 Action items:

Completion criteria:

lbeaufort commented 3 years ago

Another issue is that I think we might be running out of memory in the API umbrella cache mid-day (I haven't seen an age header over ~3 hours for the receipts datatable), so taking up more space with non-schedule endpoints would decrease the amount of memory available for schedule A endpoints.

Ideas:

JonellaCulmer commented 3 years ago

Does this relate to https://github.com/fecgov/openfec/issues/4785?

pkfec commented 3 years ago

Discuss with team which endpoints we would like to cache. Endpoints with most hits from API umbrella as on 03/23 below:

Path Hits
api.open.fec.gov/v1/schedules/ 5592429
api.open.fec.gov/v1/committee/ 1494983
api.open.fec.gov/v1/filings 1207724
api.open.fec.gov/v1/candidate/ 1199630
api.open.fec.gov/v1/names/ 837336
api.open.fec.gov/v1/calendar-dates 620884
api.open.fec.gov/v1/candidates 521375
api.open.fec.gov/v1/efile/ 453088
api.open.fec.gov/v1/candidates/ 373056
api.open.fec.gov/v1/committees 215723
api.open.fec.gov/v1/elections/ 167720
api.open.fec.gov/v1/download/ 128462
api.open.fec.gov/v1/legal/ 99128
api.open.fec.gov/v1/elections 59818
api.open.fec.gov/v1/audit-case 43224
api.open.fec.gov/v1/calendar-dates/ 42767
api.open.fec.gov/v1/electioneering/ 38797
api.open.fec.gov/v1/presidential/ 29875
api.open.fec.gov/v1/communication_costs/ 29357
api.open.fec.gov/v1/election-dates 26329
api.open.fec.gov/v1/totals/ 19775
api.open.fec.gov/v1/state-election-office 16750
api.open.fec.gov/v1/reports/ 11641
api.open.fec.gov/v1/rad-analyst 5337
api.open.fec.gov/v1/electioneering 4256
api.open.fec.gov/v1/operations-log 3980
api.open.fec.gov/v1/audit-category 3193
api.open.fec.gov/v1/reporting-dates 2846
api.open.fec.gov/v1/audit-primary-category 2841
api.open.fec.gov/v1/communication_costs 224
api.open.fec.gov/v1/auditcommitteess 11
pkfec commented 3 years ago

candidate and committee totals are most visited/viewed endpoints. After discussing with @lbeaufort and @fec-jli we all agreed to cache below endpoints during the peak hours.

  1. /committee/{committee_id}/totals/
  2. /candidate​/{candidate_id}​/totals​/
pkfec commented 3 years ago

Tested caching committee totals endpoint on local env.

Terminal log:

INFO:werkzeug:127.0.0.1 - - [07/Apr/2021 04:48:38] "GET /v1/committee/C00302695/candidates/history/2020?election_full=False&api_key=NICAR16 HTTP/1.1" 200 -
$$$$ LONG_CACHE /v1/filings
$$$$ peak_hours_start_time UTC  13:00:00
$$$$ peak_hours_end_time UTC 23:30:00
$$$$ peak_hours_expiration_time GMT Wed, 07 Apr 2021 23:30:00 GMT
INFO:werkzeug:127.0.0.1 - - [07/Apr/2021 04:48:39] "GET /v1/filings?form_category=REPORT&most_recent=True&committee_id=C00302695&cycle=2020&per_page=1&sort_hide_null=True&api_key=NICAR16 HTTP/1.1" 200 -
$$$$ LONG_CACHE /v1/committee/C00302695/totals
$$$$ peak_hours_start_time UTC  13:00:00
$$$$ peak_hours_end_time UTC 23:30:00
$$$$ peak_hours_expiration_time GMT Wed, 07 Apr 2021 23:30:00 GMT

code changes can be tracked in feature branch here# feature/4745-cache-cmte-totals_endpoint

pkfec commented 3 years ago
  1. /legal/ endpoint cache for 5mins (max-age=300sec): URL: http://localhost:8000/data/legal/advisory-opinions/2020-05/ Terminal log:

    $$$$$$ legal /v1/legal/search
    $$$$$$ In LEGAL Header: Cache-Control public, max-age=300sec
    INFO:werkzeug:127.0.0.1 - - [29/Apr/2021 10:38:02] "GET /v1/legal/search?api_key=NICAR16&type=advisory_op
  2. /efile/ endpoint - We dont cache this endpoint (max-age=0)

URL: http://localhost:8000/data/receipts/?data_type=efiling Terminal log:

$$$$$ efile /v1/schedules/schedule_a/efile/
$$$$$ In EFIING Header: Cache-Control public, **max-age=0**
INFO:werkzeug:127.0.0.1 - - [29/Apr/2021 10:44:25] "GET /v1/schedules/schedule_a/efile/?api_key=NICAR16&sort_hide_null=false&sort_nulls_last=false&data_type=efiling&sort=-contribution_receipt_date&per_page=30&page=1 HTTP/1.1" 200 -
  1. /calendar/ endpoint - Cache for 5 mins (max-age=300sec) URL: http://localhost:8000/calendar/?calendar_category_id=36 Terminal log:
    $$$$$$ calendar-dates /v1/calendar-dates/
    $$$$$$ In CALENDAR Header: Cache-Control public, **max-age=300**
    INFO:werkzeug:127.0.0.1 - - [29/Apr/2021 10:54:18] "GET /v1/calendar-dates/?api_key=NICAR16&per_page=500&calendar_category_id=36&min_start_date=2021-04-01&max_start_date=2021-05-01&_=1619708057341 HTTP/1.1" 200

http://localhost:8000/data/loans/

pkfec commented 3 years ago

efile: http://localhost:8000/data/receipts/?data_type=efiling

calendar: http://localhost:8000/calendar/?calendar_category_id=36

legal: http://localhost:8000/data/legal/advisory-opinions/2020-03/

committee/totals: http://127.0.0.1:8000/data/committee/C00302695/

candidate totals: http://localhost:8000/data/candidate/P80000722/

pkfec commented 3 years ago

Tested caching committee/candidate totals instage env. From appserver logs, it appears that /totals endpoint is being cached during the peak hours, like /schedules/ endpoint :

cache committee totals endpoint: curl -svo /dev/null "https://api-stage.open.fec.gov/v1/committee/C00302695/totals/?per_page=20&page=1&sort_hide_null=false&sort_nulls_last=false&sort_null_only=false&sort=-cycle&api_key=DEMO_KEY"

Screen Shot 2021-05-03 at 1 21 53 PM

cache candidate totals endpoint:curl -svo /dev/null "https://api-stage.open.fec.gov/v1/candidate/H8NJ01077/totals?election_full=True&cycle=1998&api_key=DEMO_KEY"

Screen Shot 2021-05-03 at 1 28 57 PM
pkfec commented 3 years ago

The code changes to cache (api umbrella) candidate and committee /totals endpoint can be tracked in PR https://github.com/fecgov/openFEC/pull/4845