alphagov / notifications-python-client

Python client for the GOV.UK Notify API
https://docs.notifications.service.gov.uk/python.html
MIT License
20 stars 22 forks source link

MS Excel opens CSV files with incorrect encoding #167

Open currycoder opened 4 years ago

currycoder commented 4 years ago

This issue is only tangentially related to the notify service, but I wanted to flag it for discussion.

When CSV files are opened by MS Excel, Excel assumes that the file is ASCII encoded - unless a Byte Order Mark is specified in the first 3 octets of the file to denote the encoding. Further details can be found here: https://stackoverflow.com/a/155176

The result of Excel assuming the incorrect encoding is that special characters are rendered incorrectly to the user.

Excel's behaviour here is quite different to other standard spreadsheet applications. Calc on Linux and Numbers on Mac do a pretty good job of automatically detecting the encoding of the file.

I believe that this issue is a good one to discuss now that the Notify service supports CSV file uploads/downloads fully with the addition of is_csv to prepare_upload.

A very simple idea that I have is to call this behaviour out in the client docs (there's a new section on CSV uploads) - in order to get ahead of this issue for developers that are looking to send CSVs from their apps.

Example Code: CSV which is badly rendered in Excel:

import io
from notifications_python_client import prepare_upload
from notifications_python_client.notifications import NotificationsAPIClient

notifications_client = NotificationsAPIClient("XXX")

csv_contents = 'Büyükdere Cad,foo,bar'
buf = io.BytesIO(csv_contents.encode('utf-8'))
file_content = prepare_upload(buf, is_csv=True)

notifications_client.send_email_notification(
    email_address='foobar@example.net',
    template_id='XXX',
    personalisation={
        'link_to_file': file_content,
    },
)

Example Code: CSV which is rendered correctly in Excel:

import codecs
import io
from notifications_python_client import prepare_upload
from notifications_python_client.notifications import NotificationsAPIClient

notifications_client = NotificationsAPIClient("XXX")

csv_contents = 'Büyükdere Cad,foo,bar'
buf = io.BytesIO(codecs.BOM_UTF8 + csv_contents.encode('utf-8'))
file_content = prepare_upload(buf, is_csv=True)

notifications_client.send_email_notification(
    email_address='foobar@example.net',
    template_id='XXX',
    personalisation={
        'link_to_file': file_content,
    },
)
NickFitz commented 2 years ago

Python's encode() method will add the UTF-8 BOM if the encoding is specified as utf-8-sig rather than utf-8.