Closed sagar-m closed 6 years ago
cc @makmanalp
It seems Dask.bag and Dask.dataframe do not work with stata files.
See https://twitter.com/makmanalp/status/969002735512244224
On Mon, Mar 5, 2018 at 9:47 AM, sagar-m notifications@github.com wrote:
It seems Dask.bag and Dask.dataframe do not work with stata files.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/dask-tutorial/issues/61#issuecomment-370442076, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszL1VcgFF-uAUsBZnHwEEQK7DK_0bks5tbVATgaJpZM4ScNEq .
Thank you!!!
@mrocklin oops sorry I missed this, thanks for intervening! @sagar-m be warned that this is currently a hack and uses some internal variables from the guts of pandas, so expect that it might break if pandas changes their StataReader class.
Hi, I am trying to read a 3 GB stata file to analyze on python. I just completed the dask tutorials on datacamp.
This code works:
data = pd.read_stata('/Users/sherrymukim/Documents/nfhs/IAHR71DT/IAHR71FL.DTA',chunksize=100000)
But the following takes forever:
My macbook has just 2GB RAM, and I will be switching to a higher RAM laptop in one month.
How do I even preview the file to know what columns are there?
Please help!!! Thank you.
I am stuck on this for one week. :-(