SUwonglab / scABC

19 stars 5 forks source link

running getClusterSpecificPvalue on binary matrix #3

Closed aditiq closed 6 years ago

aditiq commented 6 years ago

Hi,

I came across your package through your biorxiv paper.

Can the function getClusterSpecificPvalue be used on binary matrices? I am working with bulk atac seq data and generating a binary matrix (n*p) with each cell = 1 a peak n exists for sample p else 0. I can't use counts because of other issues. I am trying to use the function getClusterSpecificPvalue on this matrix to identify peaks specific to my pre-defined clusters of samples.

  1. Are any distribution assumptions being violated if I use a binary matrix with this function ?
  2. I also assigned the background medians to be all 1 so that none of the sample get removed as well all of them get equal weightage. Do you see any issues with that ?

Thanks for your help !

linzx06 commented 6 years ago

Thank you for the interest in our package.

  1. We assume a poisson assumption. You can implement the function using the binary matrix, but there may be loss of power in identifying cluster-specific peaks.
  2. Yes, that is OK.

On Fri, May 18, 2018 at 7:04 AM, aditiq notifications@github.com wrote:

Hi,

I came across your package through your biorxiv paper.

Can the function getClusterSpecificPvalue be used on binary matrices? I am working with bulk atac seq data and generating a binary matrix (n*p) with each cell = 1 a peak n exists for sample p else 0. I can't use counts because of other issues. I am trying to use the function getClusterSpecificPvalue on this matrix to identify peaks specific to my pre-defined clusters of samples.

  1. Are any distribution assumptions being violated if I use a binary matrix with this function ?
  2. I also assigned the background medians to be all 1 so that none of the sample get removed as well all of them get equal weightage. Do you see any issues with that ?

Thanks for your help !

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/timydaley/scABC/issues/3, or mute the thread https://github.com/notifications/unsubscribe-auth/ALQrSanzeSeRR7GDV7_rqWW_GGCv1hj0ks5tztTcgaJpZM4UEy09 .

aditiq commented 6 years ago

Excellent - Thanks !

timydaley commented 6 years ago

To follow up, the Poisson assumption will give higher power for binary entries than a method such as DESeq2 or other methods for RNA-seq or scRNA-seq differential expression that use a Negative Binomial model.
The question of how to determine cluster specific peaks using a binary matrix is interesting and is likely something that should be explored in more depth.