Determining Bin Statistics

de-Boer-Lab / MAUDE

Mean Alterations Using Discrete Expression

MIT License

13 stars 4 forks source link

Determining Bin Statistics #2

Closed ericmalekos closed 2 years ago

ericmalekos commented 2 years ago

Hello,

I have reporter screen data from a CRISPRi experiment across three replicates binned into lower 20% and upper 20%. The samples have been processed and compared using Mageck and I wanted to also use MAUDE to see how the hits compare. For processing with Mageck the comparison was made between the upper and lower bins with no reference to unsorted cells (lower bin was used as "control", upper as "experiment"). Would such an analysis be sensible in MAUDE? i.e. leaving out the unsorted cells, or does MAUDE's analysis depend on that being included?

Carldeboer commented 2 years ago

You can use MAUDE in a hack-ey way with no unsorted cell sample, but it is better to include it in most cases. If you are trying to compare mageck and maude using the exact same data, then use the same data (I don't think mageck can accommodate the unsorted cell sample), but if you want the best performance possible from MAUDE, you should probably use the unsorted sample. The only exception is that if your unsorted cell sample is too low coverage to be useful for estimating the abundance of each element. In this exceptional case, it may be better to either (1) assume that everything has the same abundance or (2) estimate the abundance from the sorted data. If you do not have the unsorted cell sample (you should), then you can do one of these two options as well. MAUDE still seems to work very well in our experience with these hacks.

ericmalekos commented 2 years ago

Thank you! I think I will run both without (for direct comparison to mageck) and with the unsorted.

In the former case does it make sense to rethink the bin boundaries? For example currently I have:

Bin     binStartQ      binEndQ  fraction       binStartZ         binEndZ  expt
hi       0.8           0.999    0.199          0.8416212        3.090232    1
hi       0.8           0.999    0.199          0.8416212        3.090232    2
hi       0.8           0.999    0.199          0.8416212        3.090232    3

However if I'm reducing the whole comparison to Hi vs Lo, then a bin size of 20% becomes 100% right? Does it make sense to change the binStartQ, binEndQ to 0.001, 0.999 for each replicate?

Carldeboer commented 2 years ago

You still need low bins. (e.g. do not use low as "unsorted"). You high bins look correct for top 20% of cells. To run without the unsorted, you need to make a pseudo unsorted sample and give this to MAUDE. Two strategies for doing this were given before. Depends on the complexity and uniformity of your library, and coverage of your bins (e.g. if you cover 100% of the distribution, most sequences will end up in a bin, and so it is easier to approximate their abundance). If approximating the abundance using abundance in high and low bins, I would just take the average CPM across both bins (mean of high and low CPMs for each "guide" or entity), and call this new vector the pseudo unsorted sample.

ericmalekos commented 2 years ago

Got it, thanks for the help.