elixir-explorer / explorer

Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir
https://hexdocs.pm/explorer
MIT License
1.12k stars 123 forks source link

`to_csv` function occasionally generates CSV files with binary encoding #968

Closed lostbean closed 2 months ago

lostbean commented 2 months ago

Description: I have encountered an issue where the to_csv function in Elixir Explorer sometimes generates CSV files with binary encoding instead of the expected text encoding. This problem was observed when working with the AdventureWorks2014 dataset.

Steps to Reproduce:

  1. Use the to_csv function in Elixir Explorer to export a dataframe to a CSV file.
  2. Check the file encoding using the file -i command in the terminal.

Observed Behavior: For the ProductPhoto.csv, the output is:

 > file -i data/AdventureWorks2014/ProductPhoto.csv
 data/AdventureWorks2014/ProductPhoto.csv: application/octet-stream; charset=binary

This indicates that the file was saved with a binary encoding.

However, for the WorkOrder.csv, the output is:

> file -i data/AdventureWorks2014/WorkOrder.csv
data/AdventureWorks2014/WorkOrder.csv: text/csv; charset=us-ascii

This indicates that the file was saved with the correct text encoding.

One can easily convert to correct encoding by:

> strings data/AdventureWorks2014/ProductPhoto.csv > data/AdventureWorks2014/ProductPhoto_converted.csv
> file -i data/AdventureWorks2014/ProductPhoto_converted.csv 
data/AdventureWorks2014/ProductPhoto_converted.csv: text/csv; charset=us-ascii

Expected Behavior: The to_csv function should consistently save all CSV files with the correct text encoding (text/csv), rather than occasionally defaulting to binary encoding.

Environment:

Additional Context: This issue might be related to how certain data types or characters are being handled during the export process. It would be helpful to investigate the differences in data between the two files that might be causing this inconsistency.

Let me know if there's anything else you'd like to add or modify!

lostbean commented 2 months ago

nvm! I was my mistake, I figure out that wasn't the to_csv function writing to that file. Closing it.