Closed akoumjian closed 1 year ago
@moeyensj I've rebased off your latest schema changes. It's failing on the test, I believe due to reasons you already solved. Am I missing something in the build (reference to version of pyoorb?) that would fix the automated tests?
Moving to CSV as our base serialization format will have several benefits over the current h5.
Pros
Cons
store.select
for on the fly SQL - like analysis. This really only affects individual files larger than our machine RAM limits, otherwise we just read it into a dataframe in memory. We don't have too many of those since we break up the files by year-month (largest NSC is 2.2GB)..h5
files that we like to reference / use. I've preserved ingesting by h5 in this PR, so we don't really lose that.