aphonsoar / Receita_Federal_do_Brasil_-_Dados_Publicos_CNPJ

Dados Públicos de CNPJ disponibilizados pela Receita Federal do Brasil
MIT License
253 stars 108 forks source link

Erro Decodificar - Código UTF-8 #17

Closed ALEXBSAVMS closed 1 year ago

ALEXBSAVMS commented 2 years ago

Preciso de um help pra resolver essa parte.

Trabalhando no arquivo: K3241.K03200Y0.D11211.EMPRECSV [...] Traceback (most recent call last): File "C:\Users\alex.batista\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\IPython\core\interactiveshell.py", line 3444, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 100, in empresa = pd.read_csv(filepath_or_buffer=extracted_file_path, File "C:\Users\alex.batista\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\util_decorators.py", line 311, in wrapper return func(*args, **kwargs) File "C:\Users\alex.batista\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\io\parsers\readers.py", line 586, in read_csv return _read(filepath_or_buffer, kwds) File "C:\Users\alex.batista\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\io\parsers\readers.py", line 488, in _read return parser.read(nrows) File "C:\Users\alex.batista\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\io\parsers\readers.py", line 1047, in read index, columns, col_dict = self._engine.read(nrows) File "C:\Users\alex.batista\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 223, in read chunks = self._reader.read_low_memory(nrows) File "pandas_libs\parsers.pyx", line 801, in pandas._libs.parsers.TextReader.read_low_memory File "pandas_libs\parsers.pyx", line 857, in pandas._libs.parsers.TextReader._read_rows File "pandas_libs\parsers.pyx", line 843, in pandas._libs.parsers.TextReader._tokenize_rows File "pandas_libs\parsers.pyx", line 1917, in pandas._libs.parsers.raise_parser_error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 137214: invalid continuation byte

braganetx commented 2 years ago

Como conseguiu solucionar esse problema? Também estou com esse mesmo erro.

anacarolmoraes commented 2 years ago

@braganetx faltou um argumento no comando pd.read_csv, o encoding = "ISO-8859-1". Acrescentei e funcionou. Segue o exemplo de como ficou o código depois da alteração.
empresa = pd.read_csv(filepath_or_buffer=extracted_file_path,encoding = "ISO-8859-1", sep=';',

nrows=100,

                      skiprows=0,
                      header=None,
                      dtype=empresa_dtypes)
RWaiti commented 2 years ago

https://github.com/aphonsoar/Receita_Federal_do_Brasil_-_Dados_Publicos_CNPJ/issues/4#issue-925170476

insinfo commented 2 years ago

@anacarolmoraes obrigado isso funcionou para mim