OpenFn / wcs-wildmeat

Wildlife Conservation Society ConSocSci Project
1 stars 0 forks source link

Write meta-job for OpenFn, which creates/updates jobs in OpenFn #12

Closed taylordowns2000 closed 3 years ago

taylordowns2000 commented 4 years ago

WCS Postgres Table Management Jobs

Background

WCS collects research data across several sites and surveys, which are managed by a number of staff. To ensure that all Kobo survey data can be securely and centrally captured for easier access and analysis, WCS would like to automate the process of integrating Kobo surveys and creating database tables to centrally store the data.

The specific request

Goal is to identify an automated or semi-automated process for going from a new or updated Kobo form to a Postgres set of holding tables. This will require an automated metadata and data sync between Kobo Toolbox and Postgresql.

state.json

Source: Kobo

Destination: Postgresql

adaptor

Assume the destination database will be Postgres - use language-postgresql.

expression.js

See original design notes for initial solution outline and questions. In sum, we'll need the following jobs:

(1) Metadata sync (upserts tables and columns)

  1. Job 2 (triggered by Job 1) creates/updates N jobs in OpenFn (/api/jobs/new)

(2) Data sync (upserts records)

  1. Job 3…N pulls data for specific forms and upserts those data to Postgres

(2) Data Integration

Jobs to automatically create and update the mappings from the Kobo form to the db table created in the above flow.

  1. Run this job after the Metadata Integration flow completes to ensure all destination db tables have been created/ updated. This Job 2 (triggered by Job 1) creates/updates N jobs in OpenFn (/api/jobs/new).
  2. The `N' jobs created should upsert the Kobo survey data in the db.

Data mapping rules:

  1. Use Kobo question names to map surveys to the relevant destination column.
  2. If a uuid is needed for upserting Kobo survey data, use _id + _xform_id_string. This will ensure uniqueness across Kobo servers if WCS is hoping to connect multiple accounts._
  3. Map all repeat group questions to the child table created in job 1.
  4. Map the entire raw Message payload to the kobo_payload column created in job 1.

output.json

Fetching this Kobo Rural Consumption survey for the first time should:

  1. Create 2 Postgres destination tables: the parent tbl_wcs_rural_consumption_test and child tbl_wcs_rural_consumption_test_char for capturing repeat group data
  2. Create 1 job to map the Kobo survey data to the Postgres tables created

Assume OpenFn will fetch all Kobo survey forms and submissions on a cron-timer (i.e., every 3 hours).

Toggl

Please log your time to the WCS - Automated Kobo-Postgres Sync project.

taylordowns2000 commented 4 years ago

@lakhassane , given the result of the previous WCS job (where we have parsed a Kobo form and extracted the question names, then determined the postgresql column names from those question names) I think that our main functional challenge here is to create a new object with key value pairs generated from an array:

  1. each key must be the name of the column in postgres.
  2. each value must be the path to the question in kobo.

the final output here should be a post request to an openfn api endpoint which contains a JSON body for the job, with a text string for the expression attribute of the job: upsert('some_table_name', 'some_uuid, { k: v, k:v, ... })

lakhassane commented 4 years ago

1st blocker: The main issue is I have a for loop that I'm trying to get rid of for (var idx = 0; idx < state.kobo_columns.length; idx++) {...} kobo_columns contains two elements so this loop is executed twice. Which is the same number of time each('$.tablesToBeCreated[*]') is executed.

2nd blocker: DescribeTable('table_name') does not return the list of columns in the same order than it is on database.

lakhassane commented 4 years ago

@aleksa-krolls what do you think should be considered as a uuid to do the job upsert?

One option would be to use metadata in kobo form as primary key that we would insert in the table structure that is created by the 1st job. It would then be used by the upsert job.

aleksa-krolls commented 4 years ago

@lakhassane Good question. I think we'd need the client to decide here... for example all forms with the same form_name should map to the same table. We need to push WCS to decide what are these assumptions we can make/ how do they plan to version survey forms. I raised this already to Jonathan on our last call, but I will bump him again.

I think it will be something like the combination of form name and maybe a tag. Any ideas you might recommend to the client to consider? It would need to be something consistent across survey types (even if that survey was deployed multiple times for different projects).

lakhassane commented 4 years ago

One option @aleksa-krolls would be using the formId. It stays identical during the whole process and it's easy to fetch as a primary keyfor the tables to be created + no extra development/processing is needed.

aleksa-krolls commented 4 years ago

@lakhassane that won't work because formId is unique to the instance of a form. Here's an example real-world situation:

  1. WCS creates a template BNS Survey with 40 questions that it would like to implement across 20 sites. It wants to capture data for all 40 of these questions in one data table (e.g., tbl_bns_survey) in its central postgres DB.
  2. Sites may use this template survey as is, or is they might customize the survey to add their own question or two. Sites are free to add their own questions, but the core 40 questions must still map to the same central data table for analysis of the core questions across sites.
  3. Once a site has chosen how to modify the BNS Survey template, they will upload it to Kobo Toolbox where it will be assigned a unique formId. The only way we will know that this newly uploaded survey is a version of the BNS Survey template and therefore should map to same tbl_bns_survey is if we tell the site to include "BNS Survey" in the form name and/ or indicate this in some other way in the form metadata (i.e., add a tag with string "BNS Survey"). I think we can advise WCS on the best approach here... so what do you think would be easiest from a development perspective? Should this survey type at the very least be in the form definition itself, or would it be okay if WCS included it a hidden data value embedded in the form (i.e., if there was a Kobo question in every survey called "survey_type": "bns_survey"? Let me know if easiest to discuss this one over a call later today.
lakhassane commented 4 years ago

@aleksa-krolls hmm I understand. Yes we can chat about it when you can from now to 3pm or after 430

aleksa-krolls commented 4 years ago

@lakhassane As discussed, let's assume tag (i.e., "bns_survey_v1") will be the same across all forms of the same survey type. We can therefore then use tag to define the database table name. For example, all forms with the "tag": "bns_survey_v1" should map to the Postgres table tbl_bns_survey_v1.