DeDolphins / DataHorse

Chat with your data, modify it, visualize it, create and test machine learning models all in plain English. DataHorse makes data analysis and data science conversational using LLMs.
https://datahorse.ai
MIT License
114 stars 7 forks source link

I'm facing `UnicodeDecodeError` when i'm Trying to read my local csv file. #1

Closed poneoneo closed 2 weeks ago

poneoneo commented 3 weeks ago

what i'm trying to do :

import datahorse

df = datahorse.read('dh.csv')
print(df)

bellow you have a full traceback of error that i've got.

Traceback (most recent call last):
  File "D:\Documents\Freelance_Project\aba-cli-scrapper\datahorse_test.py", line 5, in <module>
    df = pd.read_csv('dh.csv',delimiter=',')
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\oneal\AppData\Local\pypoetry\Cache\virtualenvs\aba-cli-scrapper-MFoFlQ40-py3.11\Lib\site-packages\pandas\io\parsers\readers.py", lin
e 1026, in read_csv
    return _read(filepath_or_buffer, kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\oneal\AppData\Local\pypoetry\Cache\virtualenvs\aba-cli-scrapper-MFoFlQ40-py3.11\Lib\site-packages\pandas\io\parsers\readers.py", lin
e 620, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\oneal\AppData\Local\pypoetry\Cache\virtualenvs\aba-cli-scrapper-MFoFlQ40-py3.11\Lib\site-packages\pandas\io\parsers\readers.py", lin
e 1620, in __init__
    self._engine = self._make_engine(f, self.engine)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\oneal\AppData\Local\pypoetry\Cache\virtualenvs\aba-cli-scrapper-MFoFlQ40-py3.11\Lib\site-packages\pandas\io\parsers\readers.py", lin
e 1898, in _make_engine
    return mapping[engine](f, **self.options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\oneal\AppData\Local\pypoetry\Cache\virtualenvs\aba-cli-scrapper-MFoFlQ40-py3.11\Lib\site-packages\pandas\io\parsers\c_parser_wrapper
.py", line 93, in __init__
    self._reader = parsers.TextReader(src, **kwds)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "parsers.pyx", line 574, in pandas._libs.parsers.TextReader.__cinit__
  File "parsers.pyx", line 663, in pandas._libs.parsers.TextReader._get_header
  File "parsers.pyx", line 874, in pandas._libs.parsers.TextReader._tokenize_rows
  File "parsers.pyx", line 891, in pandas._libs.parsers.TextReader._check_tokenize_status
  File "parsers.pyx", line 2053, in pandas._libs.parsers.raise_parser_error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xef in position 6071: invalid continuation byte

why my local file cannot be read ? is your package has been only made for online csv file? is there a special format required by your package to get things done ? bellow is a chunk of my csv file. what is going wrong with him?

id,name,alibaba_guranteed,minimum_to_order,supplier_id,alibaba_guranteed,certifications,ordered_or_sold,product_score,review_count,review_score,shipping_time_score,is_full_promotion,is_customizable,is_instant_order,trade_product,min_price,max_price,name,verification_mode,sopi_level,country_name,years_as_gold_supplier,supplier_service_score

1,mesh knitting weaving machine produce sunscreen net agricultural shade net anti net,1,1.0,1,1,,0,5.0,1.0,5.0,5.0,1,1,1,1,9997.0,18979.0,qingdao shanzhong imp and exp ltd.,unverified,0,chine,9,5.0

2,chinese small farm rotary tiller 12hp 15hp 20hp two wheel mini hand tractor walk behind tractors,1,1.0,2,1,,0,0.0,0.0,0.0,0.0,1,1,1,1,455.0,455.0,"shandong guoyoule agricultural machinery co., ltd.",unverified,0,chine,1,0.0

3,small multifunctional flexible 130l orchard remote control garden crawler agriculture robot sprayer,1,1.0,3,1,,0,0.0,0.0,0.0,0.0,1,1,1,1,2350.0,4620.0,"shandong my agricultural facilities co., ltd.",unverified,0,chine,1,0.0

is there a chance that some empty value could raise this error ?

1085,sprayers agricultural machinery the spraying machine is used for agricultural orchard management,1,1.0,536,1,,0,0.0,0.0,0.0,0.0,1,1,1,1,386.3,415.3,,,,,,
SsebowaDisan commented 3 weeks ago

Hi O'neal MBOULA,

Thank you for reporting the issue and providing the detailed traceback. I’m pleased to inform you that the latest version of the datahorse package has been updated to handle encoding issues automatically.

What’s New The latest version of datahorse includes improved error handling that addresses encoding problems directly. This means that if your CSV file encounters encoding issues, the package will attempt to handle them automatically without requiring you to specify encoding manually.

How to Update To get these improvements, please update your datahorse package to the latest version by running: pip install --upgrade datahorse

poneoneo commented 3 weeks ago

thank you for your response... let me try this.

SsebowaDisan commented 2 weeks ago

Is it working fine now?

poneoneo commented 2 weeks ago

Yes I decided to build my CSV file from panda library. By the way sometime the code generated by datahorse(grok/llama3) doesn't return a dataframe. Also I wanted to have an API to see the code generated by llama3

poneoneo commented 2 weeks ago

But anyway this is another issue...so this one could be close.