dib-lab / 2022-sra-gather

Classify all the metagenomes. ALL THE METAGENOMES. (Eventually.)
Other
0 stars 1 forks source link

providing querying etc via SQL database #6

Open ctb opened 2 years ago

ctb commented 2 years ago

it's pretty clear that the full set of CSV files is going to be too big to load into memory.

I think we can build and distribute a SQLite database as one dissemination method, and then provide Python/Jupyter and R/RMarkdown demonstration code for querying.

and then, of course, we can build a datasette site for exploring it, c.f. https://github.com/ctb/2021-sourmash-datasette also.