Snowflake-Labs / schemachange

A Database Change Management tool for Snowflake
Apache License 2.0
481 stars 219 forks source link

Foreign character processing #200

Closed 2023ASharathSJ closed 8 months ago

2023ASharathSJ commented 9 months ago

Hi Team,

We have created an Azure CI/CD pipeline for our project for SnowFlake deployments using SchemaChange. The pipeline is working flawlessly for all scenarios of execution of DDL, DML etc.

We have encountered one issue when we have a foreign character (French, German) in the insert scripts. The pipeline fails to read the file and below error is received:

2023-09-15T12:44:44.0639006Z syntax error line 1 at position 0 unexpected '/'.

This error occurs even when the foreign character is part of comments or at 1000th line. We receive the same error. I am unable attach the screenshot for reference as it give the error "Something went really wrong, and we can't process that file. Try again."

Sample Insert script: INSERT INTO DB.SCHEMA.TABLE VALUES ('TEST ü foreign character', 'TEST ä foreign character ');

Eagerly waiting for a response.

Thank you in advance, Sharath SJ

sfc-gh-twhite commented 9 months ago

We might need to check into the encoding used in the following function.

https://github.com/Snowflake-Labs/schemachange/blob/d638a5c509d549385f801b0f5c61958f567f96a4/schemachange/cli.py#L447-L451

Can you adjust the encoding in that function and see if it helps with the issue? iso-8859-1 might do it.

We will need to think through a more dynamic way to handle this.

2023ASharathSJ commented 8 months ago

Hi. Thank you for the suggestion.

As per our current design with customisation, this setting get overridden and hence we are unable to try the option. THe main reason being, everytime the SchemaChange is installed to ensure the latest version is being used. We are checking how to overcome this.

Can you suggest any other approach which will enable us to use this approach without any manual configuration post schemachange download and installation.

Thank you very much!!! Sharath

2023ASharathSJ commented 8 months ago

Hi Team,

We have identified, this issue is not due to ENCODING. This issue occurs if we have variable replacements within the script. If there are no variable replacements, then the foreign characters are processed correctly and are getting inserted into the table.

We have defined variables for DB so that replacements can taken care for different environments like DEV, TEST, PROPROD and PROD.

Please guide how this can be resolved as Variable replacements are needed to ensure same script is used for all environments.

Warm Regards, Sharath

sfc-gh-tmathew commented 8 months ago

@2023ASharathSJ Can you post the SQL file excerpt where you reference the variable and the schemachange call on how you replace the variable.

Alternatively, you could try using

schemachange render --vars '{ "var1":"value"}' ./path/to/sql/file/*.sql and see what is being rendered to troubleshoot your issue.

2023ASharathSJ commented 8 months ago

Thank you @sfc-gh-tmathew for the response. We have idenfied the root cause. This was due to enabling the below propery in the replace token task: Write unicode BOM

After this property was diabled, the pipeline execution was successful and the foreign characters were succesfully inserted into the tables.

Thank you all for the support. Sharath S J