SocialFinanceDigitalLabs / sf-fons-platform

https://github.com/SocialFinanceDigitalLabs/sf-fons
1 stars 0 forks source link

Data completion analysis #39

Open dotloadmovie opened 1 year ago

dotloadmovie commented 1 year ago

OUT OF SCOPE FOR ORIGINAL BUDGET> Great idea but no money to do this right now. If we get additional money, it will be used to reduce ongoing running costs and NOT to dev new functionality/features.

Report for Rashid and CA re: how much data is held/how complete it is. One off? No, a process that they want to have in place ON the platform so that they can see progress and completeness of each of their LAs and how far along the upload they have got and for which data sets.

MagicMiranda commented 12 months ago

@dotloadmovie could you add some basic detail to the Description box above too please? Thanks

MichaelHanksSF commented 6 months ago

Once pan-agg has been created, log created for hub with summary of dataset by file by year by LA that tell the hub for each file/year/LA combo:

MichaelHanksSF commented 1 month ago

Add requirement from additional draft card:

MichaelHanksSF commented 1 month ago

Add requirement from additional draft card:

patrick-troy commented 3 weeks ago

Potential fix to show the missing headers in the log file and save LA analysts time in rectifying this issue:

In the pipeline if unable to identify headers we could output a list of the just the missing headers to the log file. We could store the matches for each table, identify which has the most matches and assume that is the table they're trying to upload. e.g.

One issue I can foresee with this is other tables have similar headers and therefore matches e.g.

As you can see there would be 3 matches for both UASC and Header files. Using the % and picking highest % might help negate this but that might not always be the case (e.g. if the Header file contained the same number of columns as UASC)