Snowflake-Labs / Excelerator

This is an Excel Addin for Windows that reads and writes data to Snowflake
Apache License 2.0
78 stars 22 forks source link

Character encoding mismatch when uploading data #10

Closed mdavidn closed 3 years ago

mdavidn commented 3 years ago

When uploading data that contains non-ASCII characters, such as ä (C3 A4 in UTF-8, E4 in Windows-1252), Snowflake returns an error complaining of invalid UTF-8.

I have no experience with Excel VBA. My guess would be that either Workbook.SaveAs needs the FileFormat:=xlCSVUTF8 option or Snowflake's FILE_FORMAT needs the ENCODING = 'WINDOWS1252' option. I recommend something like the former if possible, in order to support characters not found in Windows-1252. I also recommend testing emoji and supplementary ideographs, as some Unicode implementations stumble over characters outside the basic multilingual plane.

https://github.com/Snowflake-Labs/Excelerator/blob/79d452f720ba0fe2ed74da87da5f2bf7c1a069d0/src/Load.bas#L398

ssegal100 commented 3 years ago

Thanks @mdavidn. Great input. I'll take a look.

ssegal100 commented 3 years ago

@mdavidn I added support for UTF8 characters as you mentioned. I used your first suggestion. Please let me know if it solves the issue you are experiencing. I'm at a loss for how to test emoji and supplementary ideographs. I'm not sure how to enter them in Excel. Any help you can provide would be great. If you can test it yourself and let me know the outcome, even better. Thanks!

KeithDevX commented 3 years ago

I'm attempting the upload (FYI @mdavidn and I work together) and now get a different error 'Error Saving File. Method 'SaveAs' of object '_Workbook' failed.

image

ssegal100 commented 3 years ago

@KeithDevX What version of the addin are you using? You can get if from the config button in the Ribbon. When it opens it will be the first line. Does it happen consistently with different files or just one? Would you be able to attach file that it fails on? Take out any confidential data and make it small please. Thanks.

KeithDevX commented 3 years ago

I created a sample table and was able to replicate:

create or replace table example_excel (
  c1 varchar, 
  c2 float, 
  description varchar);

insert into example_excel(c1, c2, description) 
  values ('ä', 2, 'example of a problematic record');

insert into example_excel(c1, c2, description) 
  values ('b', 2, 'example of an acceptable record');

I am able to successfully query this table using the excel tool, but am unable to upload, receiving the SaveAs Error.

Excel Example: Snowflake_Example.xlsx

Upload Parameters: Param

Version 1.1.4: Version

Error Screenshot 1: Issue_1

Error Screenshot 2: Issue_2

Thanks for all of your help on this!

ssegal100 commented 3 years ago

@KeithDevX Thanks for the great info, the only issue is that it works for me, even when I use your workbook. Does it work fine when there is no special character? Has @mdavidn tried it yet, and does it work? I think the best way to resolve it is to do a web conf and I can walk through the execution and see what's going on. Would you be able to do that? I'm available for another hour or tomorrow works as well. Just suggest a couple of times. Thanks. Steve

KeithDevX commented 3 years ago

Just FYI my finance partner had the same issue with the new build.

@ssegal100 I'll message you directly. Thanks for the stellar response on this :)

ssegal100 commented 3 years ago

Hi @KeithDevX. So it looks like its an Excel version issue. Versions prior to 2016 does not support sFileFormat = xlCSVUTF8. Would you check your Excel version to make sure that's the issue? I've updated the addin to manually export the data as UTF-8, but unfortunately that has caused other complex issues. So the other option is something that @KeithDevX mentioned and that's changing the stage file_format to windows1252 or whatever is needed. So I can have the user select the encoding value needed and it can be 1 setting for the entire workbook or I can make per worksheet. Let me know you thoughts on that design. Thanks. Steve

mdavidn commented 3 years ago

Does FileFormat:=xlCSV always return Windows-1252? How will the user know which encoding to use?

In older versions of Excel, a user exporting UTF-8 CSV manually would adjust the encoding under "Web Options" in the "Save As..." dialog. Can VBA do something similar with WebOptions?

ssegal100 commented 3 years ago

@KeithDevX I've uploaded the latest so you can make sure it works like it used to. Please let me know. Thanks.

KeithDevX commented 3 years ago

@ssegal100 Works great! Thanks for iterating with us on this.

ssegal100 commented 3 years ago

@KeithDevX Great, thanks for letting me know.

ssegal100 commented 3 years ago

@KeithDevX @mdavidn I'm going to close this issue, but if you find that you need anything else please let me know. Steve