OpenFn / ConSoSci

ConSoSci jobs to integrate: BNS and NRGT Kobo Toolbox forms and other WCS/Partner Kobo form integrations.
https://openfn.github.io/ConSoSci/
4 stars 3 forks source link

New job to map Kobo Arcadia survey to Mssql #92

Closed aleksa-krolls closed 3 years ago

aleksa-krolls commented 3 years ago

Background, context, and business value

WCS needs help developing a new job to collect information on partners' data collection plans across different research sites. In a call with them, we drafted a rough concept of the job here but this job will have additional tables and link to several existing reference tables of data (i.e., master catalogs of available data collection options) that already exist in their WCSPROGRAMS db.

Note that all the Mssql DB configuration is not complete (we've left several comments in the mapping spec where they might need to add/change tables and columns). So please build to the mapping spec, and feel free to suggest other changes as you go... WCS is on a tight timeline, so we're going to draft as much as they can, and leave it to them to complete.

The specific request, in as few words as possible

Draft a new job here: https://github.com/OpenFn/ConSoSci/blob/master/arcadia/arcadiaSiteDataCollection.js See here for the mapping specifications drafted in partnership with WCS.

state.json

See LP for WCS Central Database

Use sample_data/arcadiaSiteSurvey.json

adaptor

language-mssql

expression.js

Draft a new job here: https://github.com/OpenFn/ConSoSci/blob/master/arcadia/arcadiaSiteDataCollection.js

See here where job is configured on prod: https://www.openfn.org/projects/p5x4g4/jobs/jvbbzm/edit

output.json

For every Kobo survey, insert a ProjectAnnualDataPlan and related records...

  1. Upsert 1 WCSPROGRAMS_ProjectAnnualDataPlan record

1.1. For every 1 surveys_planned, upsert 1 WCSPROGRAMS_ProjectAnnualDataPlanSurvey child record

1.2. For every 1 surveys_planned_001, upsert 1 WCSPROGRAMS_ProjectAnnualDataPlanSurvey child record

  1. For every 1 group_qp5by62/Which_of_the_followi_ata_you_will_collect, upsert 1 WCSPROGRAMS_ProjectPlanCameraTrap record (child to WCSPROGRAMS_ProjectAnnualDataPlan)

For every dataset related to a plan, insert a ProjectAnnualDataPlanDataSet and related records...

  1. For every 1 repeat group entry in "datasets":[...], upsert 1 WCSPROGRAMS_ProjectAnnualDataPlanDataSet record (child to WCSPROGRAMS_ProjectAnnualDataPlan)

3.1. For every dataset[*].data_collection_tool, upsert 1 WCSPROGRAMS_ProjectAnnualDataPlanDataSetDataTool child record

3.2. For every dataset[*].data_management_tool, upsert 1 WCSPROGRAMS_ProjectAnnualDataPlanDataSetDataToolrecord

3.3. For every dataset[*].data_analysis_tool, upsert 1 WCSPROGRAMS_ProjectAnnualDataPlanDataSetDataTool record

3.4. For every dataset[*].challenge, upsert 1 WCSPROGRAMS_ProjectAnnualDataPlanDataSetDataChallenge record

3.5. For every dataset[*].data_management_help, upsert 1 WCSPROGRAMS_ProjectAnnualDataPlanDataSetDataAssistance record

aleksa-krolls commented 3 years ago

Thank @lakhassane, looking great :) . Pending client feedback/ config changes

aleksa-krolls commented 3 years ago

@lakhassane With more client info, I've updated several lookup tables and minor mapping changes in the job itself. Can you please help me with the following to finish the job? (This is high priority today, so keep me posted how I can help test or answer questions!)

1. Change how we map DataTools See comments in L431-L482 and see updated mapping specs - in yellow.

2. Test & make modifications so that this job runs successfully against state See state above, or feel free to run against messages in prod. Currently when I run in prod, I encounter RangeError: Invalid array length, so I want to make sure we're handling for all scenarios: https://www.openfn.org/projects/p5x4g4/runs/r7x47e5k

3. Link related records using custom external ids For every record we upsert, there will be a custom uuid DataSetUUIDID we set. (We learned that we should be setting this custom external Id field, rather than mapping the uuid to any primary key Id field - this pk is autogenerated in the db.)

This DataSetUUIDID is how we know how to relate records... so as we create m:m tables, we need to use this id to match child m:m records to the parent WCSPROGRAMS_ProjectAnnualDataPlan or WCSPROGRAMS_ProjectAnnualDataSet records. See my TODO comments throughout the code re: how to set the Foreign Keys (FK). https://github.com/OpenFn/ConSoSci/blob/35e989ffdbc85f46345bcdcbc16805d062099e8a/arcadia/arcadiaSiteDataCollection.js#L511-L524

aleksa-krolls commented 3 years ago

@lakhassane You now have IP access, so please resume work on this - but keep me posted so that I can help to more quickly answer any questions. Also note that I made some changes to the surveys mappings to reflect the correct db column names. When I run it on production now, I encounter this error... so it looks ready for you to implement that pattern we discussed where....

  1. Query SELECT WCSPROGRAMS_ProjectAnnualDataPlanID FROM WCSPROGRAMS_ProjectAnnualDataPlan WHERE DataSetUUIDID = dataValue(body._id) or SELECT WCSPROGRAMS_ProjectAnnualDataPlanDataSetID FROM WCSPROGRAMS_ProjectAnnualDataSet WHERE DataSetUUIDID = ’${dataValue(‘body._id’) + dataset[‘datasets/survey_type’] to get the ID of the parent table
  2. Then when inserting the child table, use this ID to populate the FK columns. For example, see L449: WCSPROGRAMS_ProjectAnnualDataPlanID: queryId, //FK to WCSPROGRAMS_ProjectAnnualDataPlan
lakhassane commented 3 years ago

@aleksa-krolls the mapping is done. However note that Im not sure about some table names and columns in here. If you could flag the recent updates in the sheet I can verify and change what needs to be.

aleksa-krolls commented 3 years ago

@lakhassane I've gone through and updated most issues with db config. Now we're getting an undefined error: https://www.openfn.org/projects/p5x4g4/runs/rx8bnewr

aleksa-krolls commented 3 years ago

@lakhassane I'm getting green checks (!) ... but it doesn't look like L575 onwards in the job is being executed/ evaluated? https://github.com/OpenFn/ConSoSci/blob/master/arcadia/arcadiaSiteDataCollection.js#L575-L759

See successful run where no WCSPROGRAMS_ProjectDataSetDataTool records are upserted even though multiple data tools are inserted. https://www.openfn.org/projects/p5x4g4/runs/r9w6ak9k

lakhassane commented 3 years ago

@aleksa-krolls previous issue was solved. We are having another missing table in here: https://www.openfn.org/projects/p5x4g4/runs/rn5kajdn

Invalid object name 'WCSPROGRAMS_ProjectAnnualDataSet'.

aleksa-krolls commented 3 years ago

@lakhassane Some feedback as now we're able to test across more surveys...

  1. TypeError [Error]: Cannot read property 'split' of undefined --> now getting errors wherever we use split.(' ') and there happens to be no data... can you add in error-handling that checks for values wherever we have this split? https://www.openfn.org/projects/p5x4g4/runs/rbgyyyk6

  2. If surveys_planned or surveys_planed_001 = 'none' (see L330 and L366), then do NOT insert a record for the table WCSPROGRAMS_ProjectAnnualDataPlanDataSetSurveyType. Currently when we have none value, it throws this error because this of course doesn't exist in the look-up mapping table: https://www.openfn.org/projects/p5x4g4/runs/rpd444kx

aleksa-krolls commented 3 years ago

@lakhassane This is #1 priority on Monday, please ping me when done so I can test. Thanks!

lakhassane commented 3 years ago

@aleksa-krolls it's all done as well. Corrected the issue and re-run the failed runs on prod.

aleksa-krolls commented 3 years ago

@lakhassane you're a star, thank you! will let you know if any further feedback, but looking good so far