Introducing artificial dropout issue

radio1988 commented 7 years ago

For Fig.8 MAGIC paper, it's stated that random values sampled from an exponential distribution were subtracted from expression values, such that 0%, 60%, 80%, and 90% of the values are 0 after down-sampling. In Fig.1, it's stated that 'artificial dropout' were introduced by randomly setting 80% of the values to 0. Which method is more appropriate to mimic the 'dropout' in real RNA-seq data? How is the method in Fig. 8 implemented? Will it be appropriate if we set up a threshold and set any expression value lower than the threshold to 0? Thanks!

dvdijk commented 7 years ago

Everywhere in the paper (including Fig.1 and 8) where simulate dropout we don't just set a fraction of the values to zero, but we randomly subsample molecules by subtracting values randomly sampled from an exponential distribution, thus all entries in the matrix are reduced, as a result some become zero. This is the right way to simulate dropout as dropout is not just setting some values to zero but it affects every entry in the matrix.

On Tue, Sep 19, 2017 at 8:50 PM radio1988 notifications@github.com wrote:

For Fig.8 MAGIC paper, it's stated that random values sampled from an exponential distribution were subtracted from expression values, such that 0%, 60%, 80%, and 90% of the values are 0 after down-sampling. In Fig.1, it's stated that 'artificial dropout' were introduced by randomly setting 80% of the values to 0. Which method is more appropriate to mimic the 'dropout' in real RNA-seq data? How is the method in Fig. 8 implemented? Will it be appropriate if we set up a threshold and set any expression value lower than the threshold to 0? Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pkathail/magic/issues/48, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEfs4gDeaXxYoFDuUfIiWmIWljMQ0pJks5skGFlgaJpZM4PdMyC .

audreyqyfu commented 7 years ago

Thanks, David. I'm also interested in the simulation setup. Do I understand correctly that this exponential distribution is independent of gene expression? In other words, a gene may be highly expressed, but the number to subtract can be small, whereas a lowly expressed gene may be hit with a large value from this exponential distribution?

Thanks, Audrey

radio1988 commented 7 years ago

Thanks, David. Audrey's comment is interesting, too. If the sampling from exponential distribution is independent from the expression value being subtracted, would it be more 'realistic', if we re-sample reads from each cell? E.g. Cell-i has 10M reads as ground truth(very few zeros), we re-sample 0.5M reads from it, as a result, lots of zeros are introduced. In terms of 're-sample', I meant something similar to bootstrap, except only taking 0.5M reads rather than 10M reads out of the re-sampling with replacement process.

dvdijk commented 7 years ago

What you can do, and this is something we had done but gave very similar results, is actually subsample the molecules. don't think you want to subsample reads bc, unless you want to test how well it can recover low read samples.

On Fri, Sep 22, 2017 at 1:22 PM radio1988 notifications@github.com wrote:

Thanks, David. Audrey's comment is interesting, too. If the sampling from exponential distribution is independent from the expression value being subtracted, would it be more 'realistic', if we re-sample reads from each cell? E.g. Cell-i has 10M reads as ground truth(very few zeros), we re-sample 0.5M reads from it, as a result, lots of zeros are introduced. In terms of 're-sample', I meant something similar to bootstrap, except only taking 0.5M reads rather than 10M reads out of the re-sampling with replacement process.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/pkathail/magic/issues/48#issuecomment-331508518, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEfs-NVE0otPBAUwy8jffOGPHs1ht1mks5sk-y7gaJpZM4PdMyC .

dvdijk commented 7 years ago

that is correct. However, making it dependent on the mean gave very similar results. you could actually subsample the molecules to be even more realistic.

On Tue, Sep 19, 2017 at 10:59 PM Audrey Fu Lab notifications@github.com wrote:

Thanks, David. I'm also interested in the simulation setup. Do I understand correctly that this exponential distribution is independent of gene expression? In other words, a gene may be highly expressed, but the number to subtract can be small, whereas a lowly expressed gene may be hit with a large value from this exponential distribution?

Thanks, Audrey

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/pkathail/magic/issues/48#issuecomment-330731476, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEfsz-RoZ8iLYm1Kgy0X1gu2AlEf-bIks5skH-SgaJpZM4PdMyC .

audreyqyfu commented 7 years ago

glad to hear that the exponential distribution dependent or independent of gene expression gives similar results. I guess it means that magic is robust to the potentially highly random process of dropout.

dvdijk commented 7 years ago

In fact, having the dropout be greater on highly expressed genes versus lowly expressed genes should make things easier for magic since it will maintain more genes (i.e. Not remove as many lowly expressed genes) On Fri, Sep 22, 2017 at 19:14 Audrey Fu Lab notifications@github.com wrote:

glad to hear that the exponential distribution dependent or independent of gene expression gives similar results. I guess it means that magic is robust to the potentially highly random process of dropout.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/pkathail/magic/issues/48#issuecomment-331580980, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEfs8dgazM0Z1kg2CWjcn6mHNO2HMHqks5slD9igaJpZM4PdMyC .

KrishnaswamyLab / MAGIC

Introducing artificial dropout issue #48