Remove special characters from CommCare message before loading to Salesforce

daissatou2 commented 9 months ago

Describe the bug

Salesforce is responding with error ERROR_HTTP_400: Bad Request when we send special characters that are received from CommCare.

Please expand the removeAccents function to not only remove accents, but all special characters. For example, the latest run is failing because of an "ordinal indicator" in the intervention name: Nhamaonha 6ª Classe 2023-10-09.

We expect the forms to come in the following languages: Portuguese, English & Spanish (TO-DO: update list once Maluba confirms languages). Is there a way to remove all special characters in these languages? It looks like you can find the special characters on this ASCII site: https://www.ascii-codes.com/cp860.html

This change should be implemented on 4 jobs:

Register Participant
Pre challenges
Post challenges
Attendance

To Reproduce

Here is a link to a failed run on OpenFn.org which is indicative of the bug: https://www.openfn.org/projects/grs-integrations/runs/0655dc0e-a584-7449-86dc-474f89d0d195

expression.js

Link to the job itself in Github:

Register participant: https://github.com/OpenFn/grassroot-soccer/blob/master/jobs/2.d.upsertRegisterParticipant.js Pre challenges: https://github.com/OpenFn/grassroot-soccer/blob/master/jobs/upsertPreChallenges.js Post challenges: https://github.com/OpenFn/grassroot-soccer/blob/master/jobs/upsertPostChallenges.js Attendance: https://github.com/OpenFn/grassroot-soccer/blob/master/jobs/upsertAttendanceNonSkillz.js

state.json

Messages:

Register participant: https://www.openfn.org/projects/grs-integrations/messages/06538e92-cc44-7358-a30c-6141a52f9be4 Pre challenges: https://www.openfn.org/projects/grs-integrations/messages/06538f45-108c-7741-baeb-3a77bd993ddc Post challenges: https://www.openfn.org/projects/grs-integrations/messages/06538f6a-5ea4-7b3c-9177-71bbf979c65a Attendance: https://www.openfn.org/projects/grs-integrations/messages/06538ef8-b46f-7573-8237-f24c19bfd38e

{
  "configuration": ["SEE LAST PASS: 'GRS Prod July 2023'"],
  "data": { "See message links above" },
}

Expected behavior

The run should pass with data loaded to Salesforce

To test/resolve

After the desired output is working locally (from the CLI), please [push commits to master].
[Please test the change on OpenFn.org by re-running this run (link) and confirming success.]

josephjclark commented 9 months ago

Hi, a couple of inputs from me on this.

I can think of a few technical solutions:

Continue to add explicit mappings for every expected illegal character. This will be brittle and won't scale
We could do this mapping in the salesforce adaptor itself, which would benefit everyone, but it's still a brittle solution
We MIGHT be able to automatically detect a character that salesforce will reject (ie, everything outside of the standard unicode set, and map or remove it. This would happen in the salesforce adaptor.

The problem here, apart from scaling, is that replacing or removing these characters could change the meaning of the data. It's a bit risky to automatically change the data!

It MIGHT be more appropriate to reject the entries with invalid characters (with we let Salesforce do it or we intervene in the salesforce adaptor), and have a human come in and correct the data in commcare directly. That's obviously going to be a different workflow but it might be the best way forward.

mtuchi commented 9 months ago

@daissatou2 see work in progress here #50 ,This change has 400+ lines because of the characterMap list which includes lots of accents character with their respective mapped character. I think this is a lot of addition in a job code. Is there a way we can improve the workflow to only include the characterMap and replaceAccents function in one job then all other jobs gets to use that function from upstream job. Because if we have to implement this approach each job will have 400+ addition lines of code

cc @aleksa-krolls

stuartc commented 9 months ago

Does anyone know what character set is supported by Salesforce? Like what standard or sub-set of a standard?

daissatou2 commented 9 months ago

@mtuchi these are message filter jobs not flow jobs so we'll need to add the fix to all of them.

mtuchi commented 9 months ago

@daissatou2 i have cut a new version of salesforce v4.3.0 and used the new function called toUTF8 as a replacement of replaceAccents function. You can proceed with testing on platform but make sure the salesforce adaptor version is v4.3.0. The changes are in this PR #50

cc @aleksa-krolls

daissatou2 commented 9 months ago

@mtuchi this looks good. Ready for code review.

mtuchi commented 9 months ago

@josephjclark these changes are related to the new function that was added to salesforce toUTF8 function, See changes here #50

mtuchi commented 7 months ago

@daissatou2 can we close this issue ?, Joe already approve the changes to the salesforce adaptor on https://github.com/OpenFn/adaptors/pull/441. And i doin't think he has access to this repo

daissatou2 commented 7 months ago

Sure you can feel free to close issues after code review @mtuchi

mtuchi commented 5 months ago

@daissatou2 Joe approved the changes in https://github.com/OpenFn/adaptors/pull/441. I don't have access to close this issue, If you have access to closing issue, please close it.

OpenFn / grassroot-soccer