Open juliodias20 opened 1 month ago
Thanks for reaching out @juliodias20 !
Special characters worked for me when I tried with dbt-duckdb
, so this might be specific to the dbt-athena-community
adapter rather than dbt-core
. So I'm going to transfer this issue to that repository instead.
See below for my output when using dbt-duckdb
.
Create this file:
seeds/my_seed.csv
id,some_text
1,ABC
2,Ã Á Í Ç
Run these commands:
dbt seed
dbt show --inline 'select * from {{ ref("my_seed") }}'
See this output:
| id | some_text |
| -- | --------- |
| 1 | ABC |
| 2 | Ã Á Í Ç |
Thanks for the collaboration @dbeatty10 !
Now that you said this, I tested the same case with the Databricks adapter and it works correctly! It really sounds like a problem with athena adapter.
Hello,
It works on my side with dbt-athena on Windows. Could you please try to add this parameter to .vscode/settings.json and open a new terminal in vscode ?
{
"terminal.integrated.env.windows": {
"PYTHONUTF8": "1"
}
}
Hello @e-quili , thank you so much for collaboration! I tested this solution and it works!!
I will use this in my local environment for developments, but I still think that there is a bug, once that anothers adapters can identify the Encoding of the .csv file. What do you think?
I have a similar problem: I cannot even upload the csv file to s3 that contains these words:
Sedlišćo pódzajtšo, Dolna Łužyca, etc.
This will not work with the athena dbt adapter even if the letters are utf8. I checked all possbilities of misconfigured encodings with:
os.environ["PYTHONIOENCODING"]
sys.getfilesystemencoding()
sys.getdefaultencoding()
And they are all set to utf8.
The issue is withing agate csv_py3.py
:
The function writerow()
calling self.writer.writerow(row)
if I add a try / except I get all the rows that cannot be processed.
So here I am actually stuck - these are the rows that cannot be processed because somehow it always checks with cp1252 which does not contain any special characters.
Here is some test data:
"Sedlitz Ost (Sedlišćo pódzajtšo)",
"Senftenberg (Zły Komorow)",
"Cottbus Hbf Calau (NL) Chóśebuz gł.dw",
"Gollmitz (NL) Chańc (Dolna Łužyca)",
"Calau (NL) Kalawa (Dolna Łužyca",
"Kolkwitz Süd Gołkojce pódpołdnjo",
"Cottbus-Sandow Chóśebuz-Žandow",
"Cottbus-Merzdorf Chóśebuz-Žylowk",
"Cottbus-Willmersdorf Nord Chóśebuz-Rogoznow pódpołnoc",
One additional note:
If I just read my csv and write it again (as dbt does) it just works:
my_seed = r"path/my_csv.csv"
with open(my_seed, encoding="utf-8") as f:
data = csv.reader(f)
table = agate.Table.from_csv(my_seed)
table.to_csv("asd.csv")
Maybe someone can lead me in the right direction where local csv file is actually read in dbt - I cannot find the creation of the agate table.
UPDATE:
Okay it is actually Powershell causing the issues. If I use git bash it just works fine. Windows...
Okay for everyone having issues with seeds and dbt athena on Windows: Set this: $Env:PYTHONUTF8=1
Is this a new bug in dbt-core?
Current Behavior
I would like to start by apologizing if there is already a bug report on this subject, but I couldn't find it.
Lets go, I have a .csv file (sds_teste.csv) that I am using as a seed, like this:
So, I execute the command
dbt seed -s sds_teste
and the dbt is successful executedBut, when i execute a
select
to see the table created by dbt seed command, I can see that the table cannot read the special characters (accented letters)I already try some things that I found around de internet, like to pass the
encoding: utf-8
, but i not found nothing that working.My profiles.yml
Expected Behavior
The expected behavir is that the dbt seed would can read a .csv file in encoding utf-8.
Should be:
A text with special characters, like Ã, Á, Í, or Ç
Instead of:A text with special characters, like �, �, �, or �
Steps To Reproduce
1 - Install the python 3.11.9 in a windows computer 2 - Create a python environment with
python venv
3 - Installdbt-core==1.8.7
anddbt-athena-community==1.8.4
bypip install
4 - Create a dbt project 5 - Create a .csv file in the folderseeds/
and write some example with special characters 6 - Configure the profile.yml to connect aAWS Athena
(storage:AWS S3
) 7 - Run thedbt seed
commandRelevant log output
Environment
Which database adapter are you using with dbt?
other (mention it in "Additional Context")
Additional Context
No response