GSS-Cogs / databaker

Command line tool to convert spreadsheets to databases, made for the UK's Office for National Statistics.
Other
1 stars 0 forks source link

simple ods loader #25

Closed mikeAdamss closed 3 years ago

mikeAdamss commented 3 years ago

we're currently handling ods in gssutils by converting ods files to xls files then passing them in. this is what gives us the simple "dataframe" of cells without cell properties. This: https://github.com/GSS-Cogs/gss-utils/blob/6608fd45c03c5438d93b0311be3b9d5b20f3e99b/gssutils/transform/download.py#L66-L72

this also ties gssutils into dependencies we don't want it to have to worry about (i.e the old version of xlrd as used by pyexcel).

this is an mvp to do that "convert to excel" for databaker in databaker, so loading the cells without properties is fine, (that'll be a separate, much bigger pieces of work in the future) so in the event we passed as ods file just convert it to xls or xlsx and pass it into the relevant existing table loader.

Don't use any ods compliant library that relies on old versions of xlrd, that'd just recreate the original dependency problems (I would personally explore the pandas, pyexcel and more up to date verson of xlrd we're already using in databaker rather then importing more dependencies).

canwaf commented 3 years ago

Doesn't messytables require xlrd==1.2 for their excel support anyways? As far as I can see replacing the conversion from ods to xlsx wouldn't solve the ultimate dependency.

mikeAdamss commented 3 years ago

just to document our chat on this, yes but we'd build off this messytables branch: https://github.com/GSS-Cogs/messytables/commit/ed9f3ed1ab36c86f533fbb1616c974077777857e. which is now viable as the databaker table loaders (thing what returns "tabs") are no longer dependant on the quirks of that old xlrd release.

mikeAdamss commented 3 years ago

merged.