Closed Zhaohui-Ruan closed 3 years ago
Hi Ruan,
no, the example data is just randomly generated integers, I did not try to model the count distribution of real data. It is merely there to show how your data should be formatted and how the different Scaden steps work.
For your actual data, it is best to use library-sized counts for your single cell training data - however it should not be in log scale, because that will be done by Scaden. Your bulk data can be raw, but I recommend some normalization that adjusts for gene-length, such as TPM. However the difference is not too big, so it also works with raw counts.
Hope that helps!
Cheers, Kevin
Hi Kevin! Thanks for your reply! Happy New year! ^_^ Ruan
Thanks - Happy New year to you too! :)
Hi Kevin! Sorry, one more question. The bulk data should not be in log scale too, right? Ruan
Hi Ruan,
exactly - they can be normalized but shouldn't be in log scale :)
Thanks!
Hi kevin,
Can I confirm about the conversation above? When you use 10x data as the reference, is the best normalization way of bulk RNAseq scaledTPM in tximport?
Yoshi
Hi Yoshi,
I cannot perfectly say what the best way is, as I couldn't make a thorough comparison for all kinds of different normalization techniques. But scaledTPM works well - that one I used.
Best, Kevin
Thank you!
Hi Kevin! I found that the example data that you provided are non-zero integers. Does Scaden require raw count as inputs? Is there any preprocessing step that you did to remove 0s? Ruan