Open benjello opened 9 years ago
There was some work for sas using the sas7bdat
library here
As always we would love contributions here. If you know a nice way to interact with sas and stata files through Python we would love to have you add those to odo.
One difficulty is that sas7bdat is a closed format, so we rely on the to_data_frame
capability in the sas7bdat package.
@mrocklin @talumbau: for my needs I use to_data_frame from sas7bdat package and read_stata/to_stata from pandas. I would definitely use a more flexible tool that can deal with very large table (bigger than available core memory)
The sas7bdat
library referred to above does provide limited support of bigger-than-memory access through a Python iterator, which might be a bit slower than pulling out dataframes explicitly. This is already in odo which should use this library if you have sas7bdat
installed.
Note that sas7bdat
is new and incomplete. Odo performance is limited by sas7bdat
's coverage.
Thank you @mrocklin . I will give it a try ASAP.
@benjello The most recent release of pandas
supports reading stata files using an iterator, so very large files can be sequentially imported.
This would be a nice thing to add to odo
, and not incredibly difficult if anyone is interested in contributing. AFAIK, there isn't a strong motivation from our side to implement this so it would need an interested individual to implement it. I'm happy to help guide anyone through the process of adding this.
FYI, pandas reads stata with a patched version of pyDTA: http://pandas.pydata.org/pandas-docs/stable/io.html#stata-format
https://github.com/pydata/pandas/blob/master/pandas/io/stata.py
Is there any plan to extend odo data migrator to the sas and stata data format ? For the time being, one have to go through pandas.DataFrame to simply convert these files to HDF5.