AllenInstitute / transcriptomic_clustering

improved clustering pipeline for transcriptomic data
Other
4 stars 1 forks source link

39/chunk normalization #43

Closed sgratiy closed 3 years ago

sgratiy commented 3 years ago

Making PR to this brach to facilitate demo. Now normalization supports chunking for file-backed data

Basic API:

adata_backed = sc.read_h5ad(input_file_name,backed='r')
adata_normalized = tc.normalize(adata_backed, copy_to=output_file_name)
# or setting custom chunk_size:
adata_normalized_backed = tc.normalize(adata_backed, copy_to=output_file_name,chunk_size=300)

should see something like this:

Processing in 7 chunks with chunk_size: 300
.......done!

Note: At this point, chunking works only for backed data in CSR format. I want to extend to be more general.