new workflow to autogenerate all mapping lists based on XLS file

aleksa-krolls commented 1 month ago

Background, context, and business value

The current implementation of MSF workflows (WF1 and WF2) have some of the mappings hardcoded in the job codes. The source for these hardcoded mappings is the Metadata file that we get from MSF. This Metadata file is going to get updated by the MSF team periodically. In order to avoid having to edit the hardcoded mapping in our job code, we need to find a way to read the Metadata file using an API, extract data that we are interested in, format it in a useable format and save it to a JSON file where WF1 and WF2 will reference it.

The specific request, in as few words as possible

Create a workflow with three steps:

Step 1: Get Metadata file from share point:

We are yet to get access from MSF for this step, so we will be stubbing the output of the following simulated get file request to Sharepoint -
1. OptionSets output sample
2. F01 MHPSS Baseline Output Sample
Save these json file in state.option_sets_xls_data and state.f01_mphss_baseline_xls respectively

Note: When we do get access to MSF sharepoint, we will be dynamically getting these json objects that we are stubbing in this step. As a proof of concept, we are only doing two json objects on this job, but the number of sheets fetched will be more than two when we fetch it dynamically.

Step 2: Map metadata file to option-set Json format

Use the state.option_sets_xls_data to create the following json objects:

For state.option_sets_xls_data: employ the same mapping logic as was done in WF3
For state.f01_mphss_baseline_xls: Create a JSON with key-value pair where the key is the value under column E with header External ID and the label is column BA labeled DHIS2 DE UID. We want to exclude all rows under DHIS2 DE UID that are NA

Create one main JSON object, state.metadata_mapping_json by merging the above json objects as:

{
"option_sets":  {...contents of state.option_sets_xls_data...},
"f01_mphss_baseline": {...contents of state.f01_mphss_baseline_xls...}
}

Step 3: Save option-set json to github

Create a .json file using this state.metadata_mapping_json object and make a post request to upsert the metadata_mapping.json file in the metadata folder on Github.
This job will always be upserting the metadata_mapping.json file so it should:
1. First check if the file exists. (If yes, delete it)
2. Then, Create the file containing state.metadata_mapping_json and name it as metadata_mapping.json

@aleksa-krolls This is the upsert implementation im thinking, let me know your thoughts.

Expected data volumes

The size of the Metadata file that we are accessing on sharepoint - 2.2MB

Toggl

Name of the Toggle project to log work

AishaHassen commented 1 month ago

@aleksa-krolls I've started spec-ing this issue, but not yet done. Still have a few more information I need from you. We can sync on this on Monday.

FYI @mtuchi

AishaHassen commented 1 month ago

@aleksa-krolls this is ready for your review.

mtuchi commented 1 month ago

EOD Update - #47

When mapping f01_mphss_baseline the source data is missing DHIS2 Option Code and Answers column. So these data are currently not mapped
On updating the metadata.json file, We need GitHub access token that has read-write permission on OpenFn/openfn-lime-pilot repo. See docs here

mtuchi commented 1 month ago

EOD Update I have manage to get the file update in GitHub using Classic Token and i have updated the first step to fetch files from openfnorg.sharepoint.com. I need to update the first job to fetch both files from sharepoint before handing this over for code review

AishaHassen commented 1 month ago

hey @mtuchi ,

When mapping f01_mphss_baseline the source data is missing DHIS2 Option Code and Answers column. So these data are currently not mapped

On this comment, for f01_mphss_baseline we don't need to use DHIS2 Option Code and Answers columns, we just need to map External ID with DHIS2 DE UID. This step is outlined on the issue spec as follows:

For state.f01_mphss_baseline_xls: Create a JSON with key-value pair where the key is the value under column E with header External ID and the label is column BA labeled DHIS2 DE UID. We want to exclude all rows under DHIS2 DE UID that are NA

This is what the output of the mapping should look like:

{
CXS4qAJH2qD: 'encounter-date', //encounterDate
  dfdv3SkeXKe: 'a6c5188c-29f0-4d3d-8cf5-7852998df86f', 
  hWMBCCA2yy1: 'abede172-ba87-4ebe-8054-3afadb181ea3', 
  TWuCY5r2wx7: 'ccc4f06c-b76a-440d-9b7e-c48ba2c4a0ab',
  QHrIUMhjZlO: 'd516de07-979b-411c-b7e4-bd09cf7d9d91', 
  H1fMCaOzr8F: '3e97c2d0-15c1-4cfd-884f-7a4721079217', 
  yCwuZ0htrlH: '5f6e245c-83fc-421b-8d46-061ac773ae71'
  }

mtuchi commented 1 month ago

@aleksa-krolls this is ready for testing, The PR were approved and i have updated the workflow on app cc @AishaHassen

OpenFn / openfn-lime-pilot