Teichlab / cellphonedb

MIT License
339 stars 105 forks source link

Allow .h5ad and .h5 as input, compatibility with pandas >= 1.2, python=3.8 #264

Closed zktuong closed 3 years ago

zktuong commented 3 years ago

Hi,

Similar to what the other pull requests have done, I've added a function to read .h5ad format for the expression matrices. This dramatically speeds up the load time during the _load_meta_counts step. The expression matrix is stored in the .X slot and gene symbols are extracted from the indices of the .var slot.

Also made a wrapper to read pandas created .h5 file formats if the users simply created an expression matrix with pandas' .to_hdf. It expects only a single pandas object in the .h5 file. This is not quite as fast as from .h5ad but still faster than .txt/.csv

Also co-opted the ability to read .mtx containg folders from #162

Also made corrections to try and catch/prevent errors/warnings about making edits on slices of views, dtype errors, indexing errors, concat errors, import numpy directly, rather than use the depreciated pd.np modules, mostly arising due to usage of different pandas versions (should now work for pandas 1.2).