Closed taylordowns2000 closed 3 years ago
@lakhassane , given the result of the previous WCS job (where we have parsed a Kobo form and extracted the question names, then determined the postgresql column names from those question names) I think that our main functional challenge here is to create a new object with key value pairs generated from an array:
the final output here should be a post request to an openfn api endpoint which contains a JSON body for the job
, with a text string for the expression
attribute of the job
: upsert('some_table_name', 'some_uuid, { k: v, k:v, ... })
1st blocker:
The main issue is I have a for loop that I'm trying to get rid of for (var idx = 0; idx < state.kobo_columns.length; idx++) {...}
kobo_columns contains two elements so this loop is executed twice. Which is the same number of time each('$.tablesToBeCreated[*]')
is executed.
2nd blocker:
DescribeTable('table_name')
does not return the list of columns in the same order than it is on database.
@aleksa-krolls what do you think should be considered as a uuid to do the job upsert?
One option would be to use metadata in kobo form as primary key that we would insert in the table structure that is created by the 1st job. It would then be used by the upsert job.
@lakhassane Good question. I think we'd need the client to decide here... for example all forms with the same form_name
should map to the same table. We need to push WCS to decide what are these assumptions we can make/ how do they plan to version survey forms. I raised this already to Jonathan on our last call, but I will bump him again.
I think it will be something like the combination of form name
and maybe a tag
. Any ideas you might recommend to the client to consider? It would need to be something consistent across survey types (even if that survey was deployed multiple times for different projects).
One option @aleksa-krolls would be using the formId
. It stays identical during the whole process and it's easy to fetch as a primary key
for the tables to be created + no extra development/processing is needed.
@lakhassane that won't work because formId
is unique to the instance of a form. Here's an example real-world situation:
BNS Survey
with 40 questions that it would like to implement across 20 sites. It wants to capture data for all 40 of these questions in one data table (e.g., tbl_bns_survey
) in its central postgres DB. BNS Survey
template, they will upload it to Kobo Toolbox where it will be assigned a unique formId
.
The only way we will know that this newly uploaded survey is a version of the BNS Survey
template and therefore should map to same tbl_bns_survey
is if we tell the site to include "BNS Survey" in the form name and/ or indicate this in some other way in the form metadata (i.e., add a tag
with string "BNS Survey"
). I think we can advise WCS on the best approach here... so what do you think would be easiest from a development perspective? Should this survey type at the very least be in the form definition itself, or would it be okay if WCS included it a hidden data value embedded in the form (i.e., if there was a Kobo question in every survey called "survey_type": "bns_survey"
?
Let me know if easiest to discuss this one over a call later today. @aleksa-krolls hmm I understand. Yes we can chat about it when you can from now to 3pm or after 430
@lakhassane As discussed, let's assume tag
(i.e., "bns_survey_v1"
) will be the same across all forms of the same survey type. We can therefore then use tag
to define the database table name. For example, all forms with the "tag": "bns_survey_v1"
should map to the Postgres table tbl_bns_survey_v1
.
WCS Postgres Table Management Jobs
Background
WCS collects research data across several sites and surveys, which are managed by a number of staff. To ensure that all Kobo survey data can be securely and centrally captured for easier access and analysis, WCS would like to automate the process of integrating Kobo surveys and creating database tables to centrally store the data.
The specific request
Goal is to identify an automated or semi-automated process for going from a new or updated Kobo form to a Postgres set of holding tables. This will require an automated metadata and data sync between Kobo Toolbox and Postgresql.
state.json
Source: Kobo
openfn_kobo
in LPDestination: Postgresql
adaptor
Assume the destination database will be Postgres - use language-postgresql.
expression.js
See original design notes for initial solution outline and questions. In sum, we'll need the following jobs:
(1) Metadata sync (upserts tables and columns)
N
jobs in OpenFn (/api/jobs/new)(2) Data sync (upserts records)
(2) Data Integration
Jobs to automatically create and update the mappings from the Kobo form to the db table created in the above flow.
N
jobs in OpenFn (/api/jobs/new).Data mapping rules:
uuid
is needed for upserting Kobo survey data, use_id + _xform_id_string
. This will ensure uniqueness across Kobo servers if WCS is hoping to connect multiple accounts._kobo_payload
column created in job 1.output.json
Fetching this Kobo Rural Consumption survey for the first time should:
tbl_wcs_rural_consumption_test
and childtbl_wcs_rural_consumption_test_char
for capturing repeat group dataAssume OpenFn will fetch all Kobo survey forms and submissions on a cron-timer (i.e., every 3 hours).
Toggl
Please log your time to the WCS - Automated Kobo-Postgres Sync project.