Closed lbeaufort closed 3 years ago
Another issue is that I think we might be running out of memory in the API umbrella cache mid-day (I haven't seen an age
header over ~3 hours for the receipts datatable), so taking up more space with non-schedule endpoints would decrease the amount of memory available for schedule A endpoints.
Ideas:
Does this relate to https://github.com/fecgov/openfec/issues/4785?
Discuss with team which endpoints we would like to cache. Endpoints with most hits from API umbrella as on 03/23 below:
Path | Hits |
---|---|
api.open.fec.gov/v1/schedules/ | 5592429 |
api.open.fec.gov/v1/committee/ | 1494983 |
api.open.fec.gov/v1/filings | 1207724 |
api.open.fec.gov/v1/candidate/ | 1199630 |
api.open.fec.gov/v1/names/ | 837336 |
api.open.fec.gov/v1/calendar-dates | 620884 |
api.open.fec.gov/v1/candidates | 521375 |
api.open.fec.gov/v1/efile/ | 453088 |
api.open.fec.gov/v1/candidates/ | 373056 |
api.open.fec.gov/v1/committees | 215723 |
api.open.fec.gov/v1/elections/ | 167720 |
api.open.fec.gov/v1/download/ | 128462 |
api.open.fec.gov/v1/legal/ | 99128 |
api.open.fec.gov/v1/elections | 59818 |
api.open.fec.gov/v1/audit-case | 43224 |
api.open.fec.gov/v1/calendar-dates/ | 42767 |
api.open.fec.gov/v1/electioneering/ | 38797 |
api.open.fec.gov/v1/presidential/ | 29875 |
api.open.fec.gov/v1/communication_costs/ | 29357 |
api.open.fec.gov/v1/election-dates | 26329 |
api.open.fec.gov/v1/totals/ | 19775 |
api.open.fec.gov/v1/state-election-office | 16750 |
api.open.fec.gov/v1/reports/ | 11641 |
api.open.fec.gov/v1/rad-analyst | 5337 |
api.open.fec.gov/v1/electioneering | 4256 |
api.open.fec.gov/v1/operations-log | 3980 |
api.open.fec.gov/v1/audit-category | 3193 |
api.open.fec.gov/v1/reporting-dates | 2846 |
api.open.fec.gov/v1/audit-primary-category | 2841 |
api.open.fec.gov/v1/communication_costs | 224 |
api.open.fec.gov/v1/auditcommitteess | 11 |
candidate and committee totals are most visited/viewed endpoints. After discussing with @lbeaufort and @fec-jli we all agreed to cache below endpoints during the peak hours.
Tested caching committee totals endpoint on local env.
Terminal log:
INFO:werkzeug:127.0.0.1 - - [07/Apr/2021 04:48:38] "GET /v1/committee/C00302695/candidates/history/2020?election_full=False&api_key=NICAR16 HTTP/1.1" 200 -
$$$$ LONG_CACHE /v1/filings
$$$$ peak_hours_start_time UTC 13:00:00
$$$$ peak_hours_end_time UTC 23:30:00
$$$$ peak_hours_expiration_time GMT Wed, 07 Apr 2021 23:30:00 GMT
INFO:werkzeug:127.0.0.1 - - [07/Apr/2021 04:48:39] "GET /v1/filings?form_category=REPORT&most_recent=True&committee_id=C00302695&cycle=2020&per_page=1&sort_hide_null=True&api_key=NICAR16 HTTP/1.1" 200 -
$$$$ LONG_CACHE /v1/committee/C00302695/totals
$$$$ peak_hours_start_time UTC 13:00:00
$$$$ peak_hours_end_time UTC 23:30:00
$$$$ peak_hours_expiration_time GMT Wed, 07 Apr 2021 23:30:00 GMT
code changes can be tracked in feature branch here# feature/4745-cache-cmte-totals_endpoint
/legal/ endpoint cache for 5mins (max-age=300sec): URL: http://localhost:8000/data/legal/advisory-opinions/2020-05/ Terminal log:
$$$$$$ legal /v1/legal/search
$$$$$$ In LEGAL Header: Cache-Control public, max-age=300sec
INFO:werkzeug:127.0.0.1 - - [29/Apr/2021 10:38:02] "GET /v1/legal/search?api_key=NICAR16&type=advisory_op
/efile/ endpoint - We dont cache this endpoint (max-age=0)
URL: http://localhost:8000/data/receipts/?data_type=efiling Terminal log:
$$$$$ efile /v1/schedules/schedule_a/efile/
$$$$$ In EFIING Header: Cache-Control public, **max-age=0**
INFO:werkzeug:127.0.0.1 - - [29/Apr/2021 10:44:25] "GET /v1/schedules/schedule_a/efile/?api_key=NICAR16&sort_hide_null=false&sort_nulls_last=false&data_type=efiling&sort=-contribution_receipt_date&per_page=30&page=1 HTTP/1.1" 200 -
$$$$$$ calendar-dates /v1/calendar-dates/
$$$$$$ In CALENDAR Header: Cache-Control public, **max-age=300**
INFO:werkzeug:127.0.0.1 - - [29/Apr/2021 10:54:18] "GET /v1/calendar-dates/?api_key=NICAR16&per_page=500&calendar_category_id=36&min_start_date=2021-04-01&max_start_date=2021-05-01&_=1619708057341 HTTP/1.1" 200
efile: http://localhost:8000/data/receipts/?data_type=efiling
calendar: http://localhost:8000/calendar/?calendar_category_id=36
legal: http://localhost:8000/data/legal/advisory-opinions/2020-03/
committee/totals: http://127.0.0.1:8000/data/committee/C00302695/
candidate totals: http://localhost:8000/data/candidate/P80000722/
Tested caching committee/candidate totals instage
env. From appserver logs, it appears that /totals
endpoint is being cached during the peak hours, like /schedules/
endpoint :
cache committee totals endpoint: curl -svo /dev/null "https://api-stage.open.fec.gov/v1/committee/C00302695/totals/?per_page=20&page=1&sort_hide_null=false&sort_nulls_last=false&sort_null_only=false&sort=-cycle&api_key=DEMO_KEY"
cache candidate totals endpoint:curl -svo /dev/null "https://api-stage.open.fec.gov/v1/candidate/H8NJ01077/totals?election_full=True&cycle=1998&api_key=DEMO_KEY"
The code changes to cache (api umbrella) candidate and committee /totals
endpoint can be tracked in PR https://github.com/fecgov/openFEC/pull/4845
What we're after: We should consider increasing cache length for non-schedule data (data that is refreshed once per day with MV's). One disadvantage is we'd need to bust the cache in the (rare) cases where we need to manually remove offensive or incorrect data and have it appear immediately. In these cases, we don't always refresh the MV's and sometimes wait for them to show up the next day.
https://github.com/fecgov/openFEC/blob/251a49efda44ea2c8a05f287c47b9e25f83dda9c/webservices/rest.py#L182-L212 Action items:
Completion criteria: