Open soerenab opened 8 months ago
Hello,
The datasets (which are public sources from https://www.nature.com/articles/s41586-021-03705-x) were processed, but the counts are not in log domain. Specifically, the counts in spatial data were normalized by cell size and some batch correction was performed.
We recommend going through all standard motions of single-cell analysis (library size normalization, doublet detection, etc.) for each dataset, and then passing processed (but un-logged) data onto ENVI.
Thanks for the reply - just to double check: in your above comment you recommend to do library size normalization for each datasets. Yet, the dissociated dataset in the tutorial seems to contain raw counts, i.e. the data has not been normalized. So should I only normalize the spatial but not the dissociated dataset or does it not matter whether the dissociated dataset has been normalized?
Both spatial and single-cell should be processed (filtered for low quality data, etc) but raw counts should be used as input to the VAE. @DoronHav please correct me if I'm wrong, but I assume this is the case.
Hi,
I have a question regarding the input to ENVI. I read in one of your comments in another issue that "Also, make sure the data is not logged (in the .X), since ENVI expected unlogged counts."
However, following your tutorial and inspecting the data
I noticed that
st_data.X.max() = 247.12617
sc_data.X.max() = 4360.0
i.e., sc data seems "raw" while spatial data seems to have been processed in some way.Now I am wondering: how should the sc and sp data be processed when handing it to ENVI?
Thanks a lot!