OpenFn / lwala

1 stars 3 forks source link

[Advice] Connection timeout error when upserting to Salesforce #86

Closed ritazagoni closed 1 year ago

ritazagoni commented 1 year ago

Describe the bug

When upserting to Salesforce, we regularly get a Error: connect ETIMEDOUT error. It completes successfully on re-run.

Here's an example.

It could be caused by us sometimes getting 100+ messages from CommCare which each trigger a Salesforce upsert job. You might not be able to replicate the error locally by just executing a single run. It happens with all Salesforce jobs, here's one example: https://github.com/OpenFn/lwala/blob/master/commcare-salesforce-jobs/upsert_household_and_household_visit.js Here's a period to check when a multiple of these happened within an hour: https://openfn.org/projects/lwala-chw-support/runs?dateRange=custom&endDate=2022-09-27T12%3A31%3A00.000Z&jobId=&page=1&perPage=10&searchString=&startDate=2022-09-26T12%3A31%3A00.000Z&success=1

Note that concurrency = 1 on the project.

Questions

The connection timeout happens on the Salesforce side, but we would like to see what option we may have to reduce hitting the API with a large number of requests in a short time. What approach would you recommend to reduce the chance of this happening?

Could we implement an auto-retry for these failed job? Some of these jobs include multiple upserts - should we break them into multiple jobs? Should we rather get cases from CommCare rather than relying on a webhook, and bulk upsert to Salesforce?

state

See LP for Lwala Salesfoce Sandbox (MOTG - Implementation user) Sample data: https://github.com/OpenFn/lwala/blob/master/sample_data/upsert_household_household_visit.json

adaptor

language-salesforce

Assigning to @mtuchi, would be great to have your eyes on it too @taylordowns2000

ritazagoni commented 1 year ago

hey @mtuchi how did you go with this? not very urgent, but would be good to have a few suggestions soon. thanks!

mtuchi commented 1 year ago

hey @mtuchi how did you go with this? not very urgent, but would be good to have a few suggestions soon. thanks!

@ritazagoni I am trying to investigate the timeout issue, but I have noticed in the logs that there is a TypeError at the end of the log eg: https://openfn.org/projects/lwala-chw-support/runs/06332e65-f80c-7118-9793-79230d262a31, could that be the reason for the timeout error?

Error: connect ETIMEDOUT *.*.*.*:443
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1159:16) {
  errno: -110,
  code: 'ETIMEDOUT',
  syscall: 'connect',
  address: '******,
  port: 443
}
TypeError [Error]: Cannot set property 'cleanChoice' of undefined
    at vm.js:5:23
    at Object.base.apply (/home/app/assets/node_modules/vm2/lib/contextify.js:246:34)
    at /home/app/priv/language_packs/node_modules/@openfn/language-salesforce-v2.10.0/node_modules/@openfn/language-common/lib/Adaptor.js:102:12
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
ritazagoni commented 1 year ago

@mtuchi It's possible, even though I don't understand how it can be happening. Does this imply that state is undefined at that point? Could you take a look why this may be happening?

aleksa-krolls commented 1 year ago

@mtuchi @ritazagoni can you please schedule a call to work on this together? Lmk if you need me to join

mtuchi commented 1 year ago

TypeError [Error]: Cannot set property 'cleanChoice' of undefined

@ritazagoni I have noticed that the job that is failing is TEST_upsert_person_and_person_visit.js and not the upsert_household_and_household_visit.js,

And the state is the message body -> https://openfn.org/projects/lwala-chw-support/messages/06332e65-ed9e-7a4d-9090-bae2b0b067f4

mtuchi commented 1 year ago

100+ messages from CommCare

@ritazagoni have you gotten a chance to investigate this ?