TiesdeKok / ipystata

Enables the use of Stata together with Python via Jupyter (IPython) notebooks.
192 stars 68 forks source link

UnicodeEncodeError #41

Closed Lanrzip closed 4 years ago

Lanrzip commented 4 years ago

UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: ordinal not in range(256)

could you please tell me what is that means? Don't ipystata support the Chinese language?

TiesdeKok commented 4 years ago

@ddjhpxs apologies for not getting back to you immediately, I saw your issue post and consequently forgot about it. :grimacing:

I suspect this is due to a long standing issue with Pandas and it's Stata writer, see: https://github.com/TiesdeKok/ipystata/issues/28

However, it has been a while so I am curious to see whether it got solved in the mean time, let me look into it!

TiesdeKok commented 4 years ago

Looks like we are in luck, about a month ago with Pandas 1.0 they have incorporated support to write for unicode in the Stata writer (https://github.com/pandas-dev/pandas/blob/4a74463d0244acea98f4fd49182dcf5ea6709f19/doc/source/whatsnew/v1.0.0.rst)

It would be very helpful if you could provide me with a sample file that raises the encoding error, that way I can test whether it gets resolved when Stata 15+ & Pandas 1.0 is used or whether modifications are required to ipystata.

Lanrzip commented 4 years ago

Looks like we are in luck, about a month ago with Pandas 1.0 they have incorporated support to write for unicode in the Stata writer (https://github.com/pandas-dev/pandas/blob/4a74463d0244acea98f4fd49182dcf5ea6709f19/doc/source/whatsnew/v1.0.0.rst)

It would be very helpful if you could provide me with a sample file that raises the encoding error, that way I can test whether it gets resolved when Stata 15+ & Pandas 1.0 is used or whether modifications are required to ipystata.

instance.zip here is the sample code and csv-file, I ran it in Jupiter lab. hope that will be helpful

TiesdeKok commented 4 years ago

Thanks, I was able to replicate the problem and solve it using Pandas 1.0!

image

It required a couple of minor modifications to make everything compatible with UTF-8 encoding, I've uploaded the new iPyStata version (0.4.0) to Github.

Could you try to see whether this also resolves the problem on your end? You can follow these steps to do so:

  1. Make sure you have Pandas 1.0 or higher installed. Can be verified with pd.__version__ in Jupyter Lab.
  2. Update iPyStata to 0.4.0 by running the following in the command prompt:
    pip uninstall ipystata
    pip install git+https://github.com/TiesdeKok/ipystata

One thing that I am uncertain about is the version of Stata that is necessary for this to work. The encoding requires DTA files version 118, which I believe requires Stata 14 or higher.