cds-snc / platform-forms-client

NextJS application that serves the public-facing website for Forms
https://forms-staging.cdssandbox.xyz/
MIT License
34 stars 13 forks source link

Accents not showing up properly in Excel on Windows #3877

Closed jprince-cds closed 3 weeks ago

jprince-cds commented 3 months ago

Description

When opening a CSV file that contains French accents (questions and/or responses), they don't show up properly on Windows. The accents get converted to 2 other symbols. The reason for this is the file is in Unicode format, while Windows Excel seems to be expected ANSI/iso-8859-1 encoding.

Steps to reproduce

  1. On a Windows PC, double-click a CSV file containing responses with French accents.
  2. You will notice that accents are displayed incorrectly, for example é becomes é

Details

Expected behaviour

Accents should be displayed correctly.

Here is the workaround needed for the files to open up correctly in Excel on Windows.

Right-click the CSV file, Open With, choose Notepad. Select File, Save as... In the save dialog, change the encoding to ANSI. Save the file and double-click to open in Excel. The accents are loading properly.

Screenshots or videos

image.png
jprince-cds commented 3 months ago

testing-sign-up-form-2024-06-11-responses-reponses.csv.zip

Zipped version of a CSV file containing accents. (was not able to upload a CSV file directly to GitHub)

jprince-cds commented 3 months ago

We should inform client in Ticket 17757 once this is resolved, as they are using this workaround now.

anikbrazeau commented 1 month ago

Related to #3878

timarney commented 1 month ago

How often are we seeing support requests for this type of issue come up?

Is this happening for all / most users or a few?


For now I found a couple of alternative ways to get excel to pick up the utf-8 encoded files properly at the file level vs making OS level changes.

1) Data import

from-csv

import

2) Rename the .csv extension to .txt

Open the .txt file in Excel

Excel will prompt with import step

txt

timarney commented 1 month ago

RE: converting to a .txt file

Found this note under the Excel help docs

Note: When Excel opens a .csv file, it uses the current default data format settings to interpret how to import each column of data. If you want more flexibility in converting columns to different data formats, you can use the Import Text Wizard. For example, the format of a data column in the .csv file may be MDY, but Excel's default data format is YMD, or you want to convert a column of numbers that contains leading zeros to text so you can preserve the leading zeros. To force Excel to run the Import Text Wizard, you can change the file name extension from .csv to .txt before you open it, or you can import a text file by connecting to it (for more information, see the following section).

timarney commented 1 month ago

Found this Microsoft article

You can open a CSV file encoded with UTF-8 normally if it was saved with BOM (Byte Order Mark)

You can open a CSV file encoded with UTF-8 normally if it was saved with BOM (Byte Order Mark). Otherwise, you can open it through either of the following ways.

https://support.microsoft.com/en-us/office/opening-csv-utf-8-files-correctly-in-excel-8a935af5-3416-4edd-ba7e-3dfd2bc4a032

Testing with the file attached to this issue

There is no - Byte order mark (BOM)

https://validator.w3.org/i18n-checker/check#validate-by-upload+

Screenshot 2024-08-29 at 2 41 58 PM

Ref: https://github.com/cds-snc/platform-forms-client/pull/2659

timarney commented 1 month ago

https://github.com/cds-snc/platform-forms-client/pull/4237

Testing here https://o52ubyux2llkeflxz3qkqbbs4i0yzada.lambda-url.ca-central-1.on.aws

Created a form

Added response --- download CSV

Uploaded to checker the file is showing with Byte order mark (BOM)

Tested on tablet --- opened with chars showing.

Screenshot 2024-08-29 at 2 58 15 PM
timarney commented 1 month ago

@Abi-Nada @anikbrazeau

Are you able to generate a response file that doesn't have a Byte order mark (BOM) ?

I tested under a fresh PR and on staging (no updates) and the files have the Byte order mark which per note open on my tablet fine vs the file attached to this issue doesn't (and doesn't have the BOM).

You can check a file to see if it has a BOM by uploading a file here: https://validator.w3.org/i18n-checker/check#validate-by-upload+

i.e detects BOM as UTF-8

362855253-be5d5408-6ce4-4ea2-9751-3b04c3cc6914
jprince-cds commented 4 weeks ago

@timarney We just had this issue reported again this week, so I don't think it's fixed still. See Freshdesk ticket 18826.

anikbrazeau commented 4 weeks ago

@timarney I wonder if this is an issue with files when they are not zipped.

Just tested these scenarios:

  1. downloading CSV zipped
  2. downloading CSV not zipped

1. CSV - zipped

Accents show up as expected, when opened in Excel on Windows:

zippppped

File does have a Byte order mark (BOM), when uploaded to Validator:

zip

CSV file that was zipped:

responses-reponses (1).csv

2. CSV - not zipped

Accents glitch as described in the issue, when opened in Excel on Windows:

not zipped

File does not seem to have a Byte order mark (BOM), when uploaded to Validator:

no bom

CSV file that was not zipped:

un-titre-avec-eeaoc-2024-08-30-responses-reponses (1).csv

timarney commented 3 weeks ago

@anikbrazeau can you try this preview environment for the "files when they are not zipped" scenario

It should apply the BOM to non zipped files.

https://o52ubyux2llkeflxz3qkqbbs4i0yzada.lambda-url.ca-central-1.on.aws

anikbrazeau commented 3 weeks ago

Works! 😃 excel validator

timarney commented 3 weeks ago

The fix in https://github.com/cds-snc/platform-forms-client/pull/4237 has been merged and was released in 3.21.0