CouncilDataProject / cdp-backend

Data storage utilities and processing pipelines used by CDP instances.
https://councildataproject.org/cdp-backend
Mozilla Public License 2.0
22 stars 26 forks source link

feature/reduce-and-fan-ngram-index #189

Closed evamaxfield closed 2 years ago

evamaxfield commented 2 years ago

Description of Changes

Include a description of the proposed changes.

This makes our indexing pipeline two different functions / bin scripts.

The first is to generate a lot of small index parquet chunks. The second then uploads a single chunk.

This is intended to be used with GitHub Actions where the first script runs, uploads all the index chunks to artifact files, then spawns a new GitHub action runner for each chunk uploaded to then upload that single chunk.

Gather -> Process -> Store -> Fan -> Upload

codecov[bot] commented 2 years ago

Codecov Report

Merging #189 (3f8c9bf) into main (c8b6b57) will decrease coverage by 1.23%. The diff coverage is 34.09%.

@@            Coverage Diff             @@
##             main     #189      +/-   ##
==========================================
- Coverage   94.60%   93.36%   -1.24%     
==========================================
  Files          50       51       +1     
  Lines        2632     2669      +37     
==========================================
+ Hits         2490     2492       +2     
- Misses        142      177      +35     
Impacted Files Coverage Δ
...end/pipeline/process_event_index_chunk_pipeline.py 0.00% <0.00%> (ø)
cdp_backend/file_store/functions.py 88.09% <25.00%> (-6.65%) :arrow_down:
..._backend/pipeline/generate_event_index_pipeline.py 97.33% <88.23%> (ø)
cdp_backend/pipeline/pipeline_config.py 100.00% <100.00%> (ø)
...ackend/tests/pipeline/test_event_index_pipeline.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update c8b6b57...3f8c9bf. Read the comment docs.

evamaxfield commented 2 years ago

This is ready for review! There is a related cookiecutter-cdp-deployment PR here: https://github.com/CouncilDataProject/cookiecutter-cdp-deployment/pull/108

You can see this pipeline in action here: https://github.com/JacksonMaxfield/cdp-dev/actions/runs/2491590958