LABipp / alea-gov

todos os dados de governo em um só lugar.
Other
1 stars 0 forks source link

RAIS/CAGED - microdata #1

Open odanoburu opened 7 years ago

odanoburu commented 7 years ago

dados

gris commented 7 years ago

Their data seems to have an encoding problem. Here is the error message when I try to read it in pandas: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 27: invalid continuation byte

JoaoCarabetta commented 7 years ago

Try to enconde with 'latin1'. It usually solves this problem. And use python 3 that has a better encoding system.

About the issues, it may be a good ideia to slipt them.

About the PDF, is it readable? If it is, there are good python libraries and softwares to parse table data on pfs

On Sat, Mar 18, 2017, 6:05 PM João Marcos Gris notifications@github.com wrote:

Their data seems to have an encoding problem. Here is the error message when I try to read it in pandas: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 27: invalid continuation byte

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/labFGV/alea-gov/issues/1#issuecomment-287574735, or mute the thread https://github.com/notifications/unsubscribe-auth/ATCfVCdM8L0VWlPRCpbzTMkK9ZmKoHFtks5rnEcggaJpZM4MfN83 .

--

João Carabetta / Data Developer

https://htmlsig.com/t/000001CA95SE

[image: Facebook] https://htmlsig.com/t/000001CA95SE [image: LinkedIn] https://htmlsig.com/t/000001CC9BV9 [image: Github] https://htmlsig.com/t/000001C5STHP