CouncilDataProject / cdp-backend

Data storage utilities and processing pipelines used by CDP instances.
https://councildataproject.org/cdp-backend
Mozilla Public License 2.0
22 stars 26 forks source link

bugfix/index_chunks_not_updated #216

Closed conantp closed 1 year ago

conantp commented 1 year ago

Link to Relevant Issue

This PR relates to this discussion on Slack:

https://councildataproject.slack.com/archives/CMMNC8Y7P/p1664648853737539?thread_ts=1663622963.782599&cid=CMMNC8Y7P

Description of Changes

For the Asheville CDP instance, we encountered an issue where our event index / search was not being updated for new events. After investigating, I realized that the GCS index-chunks files were not being updates.

The generate_event_index_pipeline.py routine calls chunk_index, which in turn calls fs_functions.upload_file.

The upload_file function checks if the file exists on the remote resource - if so, it just returns the existing URI. As a result, there is no ability to overwrite existing index-chunk files.

https://github.com/CouncilDataProject/cdp-backend/blob/e8835080d2f563f850a346baba17f26da67fe822/cdp_backend/file_store/functions.py#L64

This PR adds an optional parameter, overwrite to the upload_file function. It modifies generate_event_index_pipeline.py to set overrwrite=true when upload_file is called.

I attempted to add tests for this modification in behavior, included cases to confirm the same URI is returned if the file does / does not exist and overwrite=True.

In test_functions.py, I modified the EXISTING_FILE_URI variable - I felt that it indicated a URI that was not in alignment with what the upload_file function should be expected to produce. If the given SAVE_NAME is included in the call to upload_file, the EXISTING_FILE_URI would include that as part of the URI.

I'm not super familiar with how the mock.patch calls work - but I believe they are changing the behavior of "cdp_backend.file_store.functions.get_file_uri" in a way that caused the tests to pass previously - though they should have failed. For example, previously, EXISTING_FILE_URI = "gs://bucket/test_file.json". However, the test parameters indicated that there was an existing file with URI = EXISTING_FILE_URI, but the requested upload was for SAVE_NAME ("fakeSaveName"). The resulting URI should be "gs://bucket/fakeSaveName"

I'm not completely familiar with writing tests in Python — any review or feedback is appreciated!

codecov[bot] commented 1 year ago

Codecov Report

Merging #216 (4888131) into main (e883508) will increase coverage by 0.05%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #216      +/-   ##
==========================================
+ Coverage   72.51%   72.57%   +0.05%     
==========================================
  Files          64       64              
  Lines        3493     3493              
==========================================
+ Hits         2533     2535       +2     
+ Misses        960      958       -2     
Impacted Files Coverage Δ
..._backend/pipeline/generate_event_index_pipeline.py 97.29% <ø> (ø)
cdp_backend/file_store/functions.py 80.85% <100.00%> (ø)
cdp_backend/tests/file_store/test_functions.py 100.00% <100.00%> (ø)
cdp_backend/tests/test_utils.py 100.00% <0.00%> (+25.00%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.