Closed ausiddiqui closed 4 years ago
could you post the output of hexdump location.csv
? Just want to rule that out up front.
Alternatively, I see here that it isn't setting the character set like with export, so that might be an issue with different platforms defaulting to different character sets (since unicode works on my end), and that there is an option for "Unicode Passthrough" in the TPTAPI programming guide, but haven't seen information about the default behavior yet.
Ran this on my desktop (macOS) using Terminal.
> head -n 5 location.csv
location
Düsseldorf
Marché Saint-Germain
İstanbul
> hexdump location.csv
0000000 6c 6f 63 61 74 69 6f 6e 0a 44 c3 bc 73 73 65 6c
0000010 64 6f 72 66 0a 4d 61 72 63 68 c3 a9 20 53 61 69
0000020 6e 74 2d 47 65 72 6d 61 69 6e 0a c4 b0 73 74 61
0000030 6e 62 75 6c 0a
0000035
Yeah, that's interesting versus the Export (which has no problem) here which has the Character Set specified.
Check out the code in the ext-cleanup
branch. In particular this change, which like you said wasn't explicitly being set for the Update driver. I have tested it locally and having it versus not seems to reproduce the issue you are experiencing, but I just want to confirm that this is the case for you as well.
Unfortunately, this didn't do the trick. I didn't run a full fork of the ext-cleanup branch. Just added that one line after the Mload_Init and recompiled. I'll try the whole branch and see if that works.
Hmmm, I was really hoping that would do it, because it worked for me locally. When I would comment that line out it would produce the exact same messed up output you were getting (which is the respective UTF-8 bytes that has been incorrectly encoded with ASCII, so that makes sense). When I would recompile with that line set it would then load and query the correct information. Depending upon how you are compiling it is possible it isn't loading the write version, it might be worth doing a make clean
and making sure you are using the newly build shared objects.
This is now working. I clean installed it:
pip install git+https://github.com/capitalone/giraffez.git@ext-cleanup
Would you suggest using the ext-cleanup version in general, any plans to merge back to the master anytime soon? Thanks.
Hi, Is there a plan to merge ext-cleanup? I am running into the same issue, but ext-cleanup fixed it on Ubuntu
I believe this is the same / similar issue as #47 this as I'm having the same issue.
giraffez 2.0.24 macOS 10.14.2 / Ubuntu 16 python 3.5 (on macOS) / python 3.6 (on Ubuntu box) Teradata ODBC / TTU 16.10 Teradata Server 15.10 SQL client: dbeaver / Aqua Data Studio (tried both) JDBC for SQL client: TeraJDBC_16.20.00.02
Have the same result if I use the dataframe to_list and iterate over the rows as the OP had done (from #47). The character set of my connection to TD when I do the select * is utf-8 and I'm using a GUI client. I am able to see these specific unicode characters and non-latin characters in other tables where the variable's character set is UNICODE in the create table statement on the same database. So I believe this is isolated and not related to either my SQL client or the TD server itself.
Originally posted by @ausiddiqui in https://github.com/capitalone/giraffez/issues/47#issuecomment-451302424