KevinMenden / scaden

Deep Learning based cell composition analysis with Scaden.
https://scaden.readthedocs.io
MIT License
71 stars 26 forks source link

Data Scaden required #63

Closed Zhaohui-Ruan closed 3 years ago

Zhaohui-Ruan commented 3 years ago

Hi Kevin! I found that the example data that you provided are non-zero integers. Does Scaden require raw count as inputs? Is there any preprocessing step that you did to remove 0s? Ruan

KevinMenden commented 3 years ago

Hi Ruan,

no, the example data is just randomly generated integers, I did not try to model the count distribution of real data. It is merely there to show how your data should be formatted and how the different Scaden steps work.

For your actual data, it is best to use library-sized counts for your single cell training data - however it should not be in log scale, because that will be done by Scaden. Your bulk data can be raw, but I recommend some normalization that adjusts for gene-length, such as TPM. However the difference is not too big, so it also works with raw counts.

Hope that helps!

Cheers, Kevin

Zhaohui-Ruan commented 3 years ago

Hi Kevin! Thanks for your reply! Happy New year! ^_^ Ruan

KevinMenden commented 3 years ago

Thanks - Happy New year to you too! :)

Zhaohui-Ruan commented 3 years ago

Hi Kevin! Sorry, one more question. The bulk data should not be in log scale too, right? Ruan

KevinMenden commented 3 years ago

Hi Ruan,

exactly - they can be normalized but shouldn't be in log scale :)

Zhaohui-Ruan commented 3 years ago

Thanks!

yyoshiaki commented 2 years ago

Hi kevin,

Can I confirm about the conversation above? When you use 10x data as the reference, is the best normalization way of bulk RNAseq scaledTPM in tximport?

Yoshi

KevinMenden commented 2 years ago

Hi Yoshi,

I cannot perfectly say what the best way is, as I couldn't make a thorough comparison for all kinds of different normalization techniques. But scaledTPM works well - that one I used.

Best, Kevin

yyoshiaki commented 2 years ago

Thank you!