felixlindstrom / python-salesforce-api

Python Salesforce API wrapper made easy
MIT License
35 stars 16 forks source link

Bulk V2 API job data is not encoded to UTF-8 #17

Open jelm-vw opened 3 years ago

jelm-vw commented 3 years ago

Salesforce requires the uploaded data to be encoded as (or at least compatible with) UTF-8. (https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/datafiles_prepare_csv.htm, fourth bullet point from the top). Though, in practice, upload jobs with higher-code-point characters fail in Python before the ingest request can be sent to Salesforce.

The bulk client does not encode the CSV data, which remains as type str until a lower-level package must make an encoding decision. The low-level Python http library sees a str object and tries to make a bytes out of it by encoding to the HTTP-default, ISO-8859-1. But I pass it data that is not compatible with that encoding, so it raises a UnicodeEncodeError.

Here is a contrived example of something that should work but doesn't:

salesforce.bulk.insert('Contact', [
    {'FirstName': 'Σόλων', 'LastName': 'Lawgiver', 'AccountID': '000000000000000'},
])

As a workaround, in the codebase I'm working in, I've monkey-patched salesforce_api.services.bulk.v2.Job._prepare_data such that it calls encode('utf-8') and returns bytes. I've not submitted a PR to change this function, as there's a stack of calling functions that all expect str, so encoding then and there may not be the desired long-term fix. But the patch works for now.

Stan3v commented 3 years ago

Same issue here.

octopyth commented 3 years ago

@jelm-vw can you paste your solution or make a fork?

jelm-vw commented 3 years ago

This is effectively the (temporary) monkey-patch I use:

# patch.py
from functools import wraps

def _encode_job_data(prepare_data):
    @wraps(prepare_data)
    def wrapper(*args, **kwargs):
        original: str = prepare_data(*args, **kwargs)
        encoded: bytes = original.encode('utf-8')
        return encoded

    return wrapper

def patch_salesforce_api(salesforce_api):
    salesforce_api.services.bulk.v2.Job._prepare_data = _encode_job_data(salesforce_api.services.bulk.v2.Job._prepare_data)
# some other module
import salesforce_api
import patch

patch.patch_salesforce_api(salesforce_api)
octopyth commented 3 years ago

@jelm-vw It works! Thanks a million!

felixlindstrom commented 3 years ago

Nice find! And nice workaround! I will create a PR for this, this weekend, and make sure to attempt to detect the data encoding before encoding it!

abecquet77 commented 1 year ago

How can I use your monkey-patch in my code

from salesforce_api import Salesforce
client = Salesforce(...)
...
client.bulk.upsert('Account', accounts)
...