Closed 1jamesthompson1 closed 1 week ago
given the changing capabilities of the project I have purged alot of older modules to keep the space fresh.
Further more the structure of the projects should really change.
Currently it is Gather_Wrangle -> Extract_Analyze.
Really it sohuld be gather -> extract -> analyze
The updating of the previous modules have been completed. They now all follow #153. However not all follow #61.
Now an embedding class needs to be created.
The last step of database download and upload should wait until what deployment looksl ike has been decided.
State
The engine is suppoed to be a pipeline that takes the pdfs reports and outputs useful datasets.
Problem
The work which has been conducted in the notebooks is well out of date with engine. This has meant that development has gone fast but the pipeline is broken and does not work end to end as it should.
Solution
I ned to fix it so that it can complete the wohle pipeline
This three steps should results in a few dataframe files in the form of pickles or something.
Then from there two more small parts need to be added. However as they involve the deployment they might be left until another issue one #172 is closer to being solved. step 0: get all the current data form the databases so that we dont constantly repeat the same work step n+1: upload the newly calculated datasets to the database
Lastly it is worth noting that this can also be a chance to refactor and make the experience of running it smoother with better logs.
Related issues
153
59
61