KevinMenden / scaden

Deep Learning based cell composition analysis with Scaden.
https://scaden.readthedocs.io
MIT License
71 stars 25 forks source link

Scaden simulate, data input #121

Closed gemmabb closed 2 years ago

gemmabb commented 2 years ago

Hi!

First of all, thanks for sharing your work :) According to the documentation, when simulating the data from our single-cell dataset, we should normalize these counts by the library size. However, the example counts file I see is not strictly normalized, so I was wondering if a strict normalization (i.e., dividing the raw counts by the sum for that sample) would work, since these data would only contain values from 0 to 1.

Thank you so much in advance!

KevinMenden commented 2 years ago

Hi!

Sorry for the late reply! The counts are normalized exactly to this range inside Scaden's processing scheme, with a normalization method chosen for this. So while you could normalize your counts to 0-1 and it would probably still work, I wouldn't advice doing that.

Hope that answers your question?

Cheers, Kevin

gemmabb commented 2 years ago

Yes! It does answer it!

Thank you so much,

Gemma

gemmabb commented 2 years ago

Sorry to bother you again... Our single cell data should anyway be library normalized before using scaden simulate, shouldn't it? I think you were using scanpy to get your single cell data library normalized counts but instead of being summed to 1, they added to the median of the library size before normalization. Am I right?