blackrock / ingen

InGen is a command line tool written on top of pandas and great_expectations to perform small scale data transformations and validations without writing code.
Apache License 2.0
13 stars 5 forks source link

Can't read UTF-16 XML files #44

Open shpiyu opened 4 months ago

shpiyu commented 4 months ago

Describe the bug I tried reading an XML file that is encoded in UTF-16 but ingen could not read it. It creashed with UnicodeDecodeError while trying to read using my mahcine's default encoding (utf-8).

To Reproduce Steps to reproduce the behavior:

  1. Create a XML file with UTF-16 encoding
  2. Try to read it via InGen by adding it as a data source
  3. See error: Screenshot 2024-05-11 at 6 45 13 PM

Expected behavior InGen should read UTF-16 files without crashing

Environment (please complete the following information):

Additional context We should add an option in the FileSource for users to declare the file encoding.

ChillarAnand commented 3 months ago

This is fixed in https://github.com/blackrock/ingen/pull/46