OpenFn / lwala

1 stars 3 forks source link

Get CommCare cases in bulk #93

Closed aleksa-krolls closed 1 year ago

aleksa-krolls commented 1 year ago

Background

This is related to the bulkification work we are doing - see issue #89 for more details. To feed the bulk Salesforce jobs, we need to get data in bulk from CommCare.

Request

We need to develop a job that will GET cases in bulk from the CommCare API. The curl request looks like something this:

curl --request GET 'https://www.commcarehq.org/a/APP-NAME/api/v0.5/case/?limit=5000&type=Household&indexed_on_start=2023-01-01T00:00:00' \
--header 'Content-Type: application/json' \

Query parameters to include:

  1. limit=5000 (this is the max # of records we can get at a time; so we will have to send multiple requests if the record count exceeds 5k)
  2. type=Household (to get only Household cases)
  3. indexed_on_start=${YYYY-MM-DDThh:mm:ss} (timestamp to filter on - this should be a dynamic cursor to equal the last time time job synced OR a manual cursor 2023-01-01T00:00:00)

See here for the relevant CommCare API docs: https://confluence.dimagi.com/pages/viewpage.action?pageId=12224287

NOTE: That if the record count for this query exceeds 5000 records, then there will be a metadata object in the response that tells us how many pages there are. We should then send another request for each page to get ALL records available for that query.

expression

Please make changes to this job on the ⚠️ bulk-testing branch ⚠️. https://github.com/OpenFn/lwala/blob/bulk-testing/bulk/getCases-HH.js

Note that I copied over code from another job bulk-fetch.js we previously wrote to get CommCare forms in bulk. For this job, we need to get cases, not forms, but maybe you can reuse the recursiveGet code in order to handle paging and sending multiple requests if the record count exceeds 5000.

trigger

This job will run on a cron schedule every day (but later the client may decide to increase the frequency).

adaptor

http

state

"configuration": { SEE LP OpenFn CommCareHQ (Lwala) - PROD}, 
"data": {}
"cursor": {dynamic OR manual cursor - see comments above re: 'indexed_on_start'}

output

This job should return a final state like this that can be passed onto the bulk Salesforce jobs: https://github.com/OpenFn/lwala/blob/bulk-testing/sample_data/arrayOfCases-HH.json

mtuchi commented 1 year ago

@aleksa-krolls the PR for this issue is ready for review

aleksa-krolls commented 1 year ago

hey @mtuchi so the recursive GET seems to work well. (FYI, I adjusted the limit of the request on the bulk-testing branch to allow for easier platform testing.)

Now I am trying to configure this as a flow job so that when this recursive GET job finishes, it will trigger upsert_household_and_household_visit.js. However, it doesn't look the like the final state outputted by this GET job will work for this bulk upsert job.

  1. See successful run of this recursive GET job: https://www.openfn.org/projects/staging-lwala-chw-support/runs/0643f7b8-78c7-7f9a-b15a-c0b8ddf2c548
  2. See failed run of upsert_household_and_household_visit.js job: https://www.openfn.org/projects/staging-lwala-chw-support/runs/0643f7b8-a8e0-7980-b139-d6352bde7e91

Can you help me make sure this works as a flow job as well? Check out what I've configured on platform:

  1. Job 1 - Bulk get HH cases: https://www.openfn.org/projects/staging-lwala-chw-support/jobs/IkFOXK 2: Job 2 - Bulk upsert HHs: https://www.openfn.org/projects/staging-lwala-chw-support/jobs/5ow3q5
mtuchi commented 1 year ago

@aleksa-krolls I have updated commcare-salesforce-jobs/upsert_household_and_household_visit.js to work with the payload from bulk/getCases-HH.js. The job is passing now