MPUSP / nf-core-crispriscreen

Process next generation sequencing data obtained from CRISPRi repression library screenings
MIT License
4 stars 2 forks source link

Add fitness report for `Mageck` output #20

Open m-jahn opened 1 year ago

m-jahn commented 1 year ago

Description of feature

m-jahn commented 7 months ago

@ute-hoffmann are you using the Mageck output? Would it make sense for you to add fitness report for Mageck?

ute-hoffmann commented 7 months ago

I'm currently usually not using Mageck, so not of real importance to me

m-jahn commented 7 months ago

OK good to know. Because it is too slow, or not informative?

ute-hoffmann commented 7 months ago

Half of the projects I am working on are enzyme engineering projects where we do not have several sgRNAs targeting the same gene. Just realized that this might change for a larger library we'll work on, with several barcodes associated with one gene. So might be of interest. And then, I think it crashed too frequently and I still wanted to read a bit more about the output it gives. And I always forgot to do so. But guess Mageck is probably giving a better analysis than the usual tests

m-jahn commented 7 months ago

And then, I think it crashed too frequently and I still wanted to read a bit more about the output it gives. And I always forgot to do so. But guess Mageck is probably giving a better analysis than the usual tests

I'm not sure about this. People use it but it seems it's not further developed apart from bug fixes maybe. And the performance is lousy. The DESeq and edger packages on the other hand are not made for this purpose but seem to do the task well

ute-hoffmann commented 7 months ago

Hmm yeah I thought more about the adjusted p values assigned to different genes. With the Wilcoxon test, I always get pretty lousy adjusted p values and people claim one should only use it for more than (10?) replicates or so. Mageck might help with that

m-jahn commented 7 months ago

yes that's an issue. the rank sum test is not ideal for significance analysis, it produces too discrete (non-continuous) distributions. if you stumble over an alternative I'd be happy to hear.