it's pretty clear that the full set of CSV files is going to be too big to load into memory.
I think we can build and distribute a SQLite database as one dissemination method, and then provide Python/Jupyter and R/RMarkdown demonstration code for querying.
it's pretty clear that the full set of CSV files is going to be too big to load into memory.
I think we can build and distribute a SQLite database as one dissemination method, and then provide Python/Jupyter and R/RMarkdown demonstration code for querying.
and then, of course, we can build a datasette site for exploring it, c.f. https://github.com/ctb/2021-sourmash-datasette also.