OpenFn / openfn-lime-pilot

MSF LIME Project - OpenFn Workflows for Iraq Pilot
1 stars 0 forks source link

new workflow to autogenerate all mapping lists based on XLS file #47

Closed aleksa-krolls closed 1 month ago

aleksa-krolls commented 1 month ago

Background, context, and business value

The current implementation of MSF workflows (WF1 and WF2) have some of the mappings hardcoded in the job codes. The source for these hardcoded mappings is the Metadata file that we get from MSF. This Metadata file is going to get updated by the MSF team periodically. In order to avoid having to edit the hardcoded mapping in our job code, we need to find a way to read the Metadata file using an API, extract data that we are interested in, format it in a useable format and save it to a JSON file where WF1 and WF2 will reference it.

The specific request, in as few words as possible

Image

Create a workflow with three steps:

Step 1: Get Metadata file from share point:

Note: When we do get access to MSF sharepoint, we will be dynamically getting these json objects that we are stubbing in this step. As a proof of concept, we are only doing two json objects on this job, but the number of sheets fetched will be more than two when we fetch it dynamically.

Step 2: Map metadata file to option-set Json format

Use the state.option_sets_xls_data to create the following json objects:

  1. For state.option_sets_xls_data: employ the same mapping logic as was done in WF3
  2. For state.f01_mphss_baseline_xls: Create a JSON with key-value pair where the key is the value under column E with header External ID and the label is column BA labeled DHIS2 DE UID. We want to exclude all rows under DHIS2 DE UID that are NA

Create one main JSON object, state.metadata_mapping_json by merging the above json objects as:

{
"option_sets":  {...contents of state.option_sets_xls_data...},
"f01_mphss_baseline": {...contents of state.f01_mphss_baseline_xls...}
}

Step 3: Save option-set json to github

@aleksa-krolls This is the upsert implementation im thinking, let me know your thoughts.

Expected data volumes

The size of the Metadata file that we are accessing on sharepoint - 2.2MB

Toggl

Name of the Toggle project to log work

AishaHassen commented 1 month ago

@aleksa-krolls I've started spec-ing this issue, but not yet done. Still have a few more information I need from you. We can sync on this on Monday.

FYI @mtuchi

AishaHassen commented 1 month ago

@aleksa-krolls this is ready for your review.

mtuchi commented 1 month ago

EOD Update - #47

mtuchi commented 1 month ago

EOD Update I have manage to get the file update in GitHub using Classic Token and i have updated the first step to fetch files from openfnorg.sharepoint.com. I need to update the first job to fetch both files from sharepoint before handing this over for code review

AishaHassen commented 1 month ago

hey @mtuchi ,

When mapping f01_mphss_baseline the source data is missing DHIS2 Option Code and Answers column. So these data are currently not mapped

On this comment, for f01_mphss_baseline we don't need to use DHIS2 Option Code and Answers columns, we just need to map External ID with DHIS2 DE UID. This step is outlined on the issue spec as follows:

For state.f01_mphss_baseline_xls: Create a JSON with key-value pair where the key is the value under column E with header External ID and the label is column BA labeled DHIS2 DE UID. We want to exclude all rows under DHIS2 DE UID that are NA

This is what the output of the mapping should look like:

{
CXS4qAJH2qD: 'encounter-date', //encounterDate
  dfdv3SkeXKe: 'a6c5188c-29f0-4d3d-8cf5-7852998df86f', 
  hWMBCCA2yy1: 'abede172-ba87-4ebe-8054-3afadb181ea3', 
  TWuCY5r2wx7: 'ccc4f06c-b76a-440d-9b7e-c48ba2c4a0ab',
  QHrIUMhjZlO: 'd516de07-979b-411c-b7e4-bd09cf7d9d91', 
  H1fMCaOzr8F: '3e97c2d0-15c1-4cfd-884f-7a4721079217', 
  yCwuZ0htrlH: '5f6e245c-83fc-421b-8d46-061ac773ae71'
  }
mtuchi commented 1 month ago

@aleksa-krolls this is ready for testing, The PR were approved and i have updated the workflow on app cc @AishaHassen