force:data:bulk:insert fails for UTF-8 with BOM (possible fix in comments)

jhilyard commented 4 years ago

Summary

sfdx force:data:bulk:insert fails with UTF-8 with BOM encoded CSV - it treats the byte order mark as part of first field name

Steps To Reproduce:

Repository to reproduce: sfdx_upsert_utf8bom

More repro step explanation is provided in the readme.md

NOTE: If your issue is not reproducable by dreamhouse-lwc, i.e. requires specific metadata or files, we require a link to a simple Salesforce project repository with a script to setup a scratch org that reproduces your problem.

sfdx force:org:create -a MyScratchOrg -f config/project-scratch-def.json -s
sfdx force:source:push
sfdx force:data:bulk:upsert -s Account -f ./Account_UTF8_NO_BOM.csv -i Id -w 2
sfdx force:data:bulk:upsert -s Account -f ./Account_UTF8_BOM.csv -i Id -w 2
sfdx force:data:bulk:status -i <jobId> -b <batchId>

Expected result

Accounts with Name BOM and No_BOM are present.

Actual result

Account with Name No_BOM is present. The upsert using the UTF-8 BOM file fails because the byte order mark is considered part of the first column name in the CSV. The command exits to the prompt before the timeout, checking the job batch status shows InvalidBatch : Field name not found : Name. Pasting the message into VS Code shows a non-printable character symbol at the beginning of the field name Name.

Additional information

The documentation only refers, without providing a link, to Preparing a CSV, which only says files must be in UTF-8 format. Providing a warning about UTF-8 BOM incompatibility on the documentation page would be a greatly appreciated workaround.

Please note: Tableau Prep Builder (at least the latest version, 2020.3.1) creates output CSV files with UTF-8 BOM encoding -- only! There is no indication of encoding, the choice is only "hyper" format or "CSV". Since Tableau Prep Builder can pull data from Salesforce (to get lookup Ids), databases, and text files, and is in the Salesforce ecosystem, it seemed like a good plan to use Tableau Prep Builder and sfdx force:data:bulk:insert for scripting incremental loads since Tableau Prep Builder itself cannot output to Salesforce but is much more user-friendly than scripting Data Loader. That's what got me in this mess.

SFDX CLI Version(to find the version of the CLI engine run sfdx --version):

sfdx-cli/7.78.1-5a65d9dd2f win32-x64 node-v12.18.3

SFDX plugin Version(to find the version of the CLI plugin run sfdx plugins --core)

@oclif/plugin-autocomplete 0.1.5 (core)
@oclif/plugin-commands 1.3.0 (core)
@oclif/plugin-help 3.2.0 (core)
@oclif/plugin-not-found 1.2.4 (core)
@oclif/plugin-plugins 1.9.1 (core)
@oclif/plugin-update 1.3.10 (core)
@oclif/plugin-warn-if-update-available 1.7.0 (core)
@oclif/plugin-which 1.0.3 (core)
@salesforce/sfdx-trust 3.4.3 (core)
alias 1.1.2 (core)
analytics 1.12.1 (core)
auth 1.3.0 (core)
config 1.1.10 (core)
generator 1.1.3 (core)
salesforcedx 50.3.1 (core)
├─ templates 50.1.0 (core)
├─ custom-metadata 1.0.10 (core)
├─ salesforce-alm 50.3.1 (core)
├─ @salesforce/sfdx-plugin-lwc-test 0.1.7 (core)
└─ apex 0.1.2 (core)
sfdx-cli 7.78.1 (core)

OS and version:

Windows 10 Version 10.0.18363 Build 18363

github-actions[bot] commented 4 years ago

Thank you for filing this issue. We appreciate your feedback and will review the issue as soon as possible. Remember, however, that GitHub isn't a mechanism for receiving support under any agreement or SLA. If you require immediate assistance, contact Salesforce Customer Support.

jhilyard commented 4 years ago

Here's a possible one-line fix: adding parser options

  let parser = parse({
    columns: true,
    skip_empty_lines: true
  });

add the line bom:true:

  let parser = parse({
    bom: true,
    columns: true,
    skip_empty_lines: true
  });

as recommended per csv-parse documentation

WillieRuemmele commented 3 years ago

@jhilyard sorry for the delay in response, but thank you for digging into the code and suggesting a fix. We just OSS'd that plugin https://github.com/salesforcecli/data/tree/main/packages/plugin-data I'll put up a PR there adding that option

git2gus[bot] commented 3 years ago

This issue has been linked to a new work item: W-8836961

WillieRuemmele commented 3 years ago

https://github.com/salesforcecli/data/pull/31

WillieRuemmele commented 3 years ago

Hi @jhilyard check out the announcement for the new data plugin

mshanemc commented 3 years ago

I think this is fixed in the new data plugin released 4/1. If not, re-open or create a new case.

jhilyard commented 3 years ago

@WillieRuemmele @mshanemc and other contributors please accept my belated thanks for the UTF-8 BOM CSV fix! My attention was elsewhere when the data plugin was released; but I've been using it with no problems and I appreciate your efforts.

forcedotcom / cli