Open AlexS12 opened 6 years ago
Regarding the filtering: the more functions you provide, the more you approach to reimplement SQL... SQLAlchemy won't be of much help here, since it's intended to serve as an abstraction layer to several databases, and if I understand correctly, our data will always be in SQLite. For arbitrary queries, I think users will have to learn SQL anyway.
If pandas dtype inference is not working well, some manual work will be needed... This is unavoidable and someone has to do it. A series of transformations should be written to prepare the data and present it with a clean schema. Welcome to the "data wrangling" world :cowboy_hat_face:
This is related to https://github.com/AeroPython/flight-safety-analysis/issues/2, but given that we have to transform the database from access, would it be a good approach to transform it a to an hdf5 file with the suitable dtypes? That would remove the dtype inference from filtering functions and we wouldn't use SQL anymore.
I don't have experience reading multi-table HDF5 files and not everyone loves the format, so I cannot comment here.
Thanks for the link! Moving this kind of data from a SQL database to an hdf5 does not seem a sensible option.
Currently some functions under
flight_safety/queries
are provided to access to the data base filter following a criteria and return a DataFrame with the suitable column types.Issues:
Would something like sqlalchemy (https://www.sqlalchemy.org/) or pypika (https://github.com/kayak/pypika) help here?