Add documentation - Githubissues

I think this is a great addition as it currently stands, but have a high-level comment about the "formats" table.

Basically, there are two levels of "format" that are conflated:

level 1, "container" formats
- describe interactions with persistence layers (disk, databases, etc.)
- no domain-specific info
- examples:
- HDF5
- zarr
- n5
- parquet
- comes with additional parameters:
- dense vs sparse (where sparse can actually be one of several layouts, e.g. coo, csc, csr)
- chunk sizes/dimensions
- compression parameters
level 2, "domain" formats:
- describe how to lay out data semantically, given a persistence layer:
- rows as genes or cells?
- row-metadata all in one "array", or one array per row-metadata key?
- etc.
- examples:
- anndata (can be backed by HDF5 or numpy arrays)
- loom (HDF5)
- 10x's HDF5 format

I'm starting to chafe against the combinatorial blowup already, "stuffing database columns into filenames", e.g.:

Filename	Description
ica_bone_marrow_h5.h5	10X HDF5
ica_bone_marrow.10x.16m.zarr	10x HDF5 layout, converted to zarr in 16MB chunks
ica_bone_marrow.10x.32m.zarr	10x HDF5 layout, converted to zarr in 32MB chunks
ica_bone_marrow.10x.64m.zarr	10x HDF5 layout, converted to zarr in 64MB chunks
ica_bone_marrow.h5ad	Converted to AnnData's HDF5 format
ica_bone_marrow.ad.16m.zarr	AnnData's HDF5 format in zarr w/ 16MB chunks
ica_bone_marrow.ad.32m.zarr	AnnData's HDF5 format in zarr w/ 32MB chunks
ica_bone_marrow.ad.64m.zarr	AnnData's HDF5 format in zarr w/ 64MB chunks

These all describe the same data, and can be losslessly converted between one another. (Sorry, I really need to get this stuff into a public bucket and stable code pointer; will do asap, cf. #9)

So we'll have to decide how we want to deal with this combo-explosion.

My guess is that we can hackily stuff this info into filenames like the above for now, because our goal is mostly to settle on one or a few types of vetted formats+params that will be used more broadly, and in those broader settings ppl won't have to worry about the exponential parameter-space.

OTOH, if there's appetite to take a principled approach to provenance and put all this metadata in [a database that we route accesses to the data through] (presumably this is a focus in larger HCA-land), I'm interested in helping with that too!

Either way, I think that table looks in my mind roughly like {loom, anndata, 10x} x {zarr, hdf5, n5, parquet} x {a few best guesses at additional parameters for each one, like you have here}

HumanCellAtlas / table-testing

Add documentation #8