Use Salmon for gene expression measurement estimation instead of STAR

ChengHsiangLu commented 2 years ago

Hello, AllSorts is a great tool and I am excited to use it for our B-ALL cases. I have a question about the usage. I would like to use Salmon for gene expression measurement estimation instead of the STAR workflow you show on your GitHub. Would this be ok or do you suggest using STAR only?

Thanks, Sam

breons commented 2 years ago

Hi, thanks for giving ALLSorts a go!

In truth I don't know. The suggested pipeline will ensure that your counts are processed in a similar way to the training data, hopefully limiting any batch effects. Though, if you can ensure that the same gene annotations are used (and are thus have all features/genes available in your final counts), you might find it works well enough.

If any of the subtypes are to be impacted, it's likely the ploidy ones. However, If you're finding multiple subtypes being called per sample (let's say TCF3-PBX1, MEF2D, ZNF384 all in the same sample), it's likely the counts are introducing some effects.

Perhaps for the first few samples, try using both methods and compare.

Let us know how you go :). Breon.

ChengHsiangLu commented 2 years ago

Hi Breon,

Thanks for your reply! I'll try a few samples first and then compare both methods.

Best,

Sam

Oshlack / ALLSorts

Use Salmon for gene expression measurement estimation instead of STAR #9