calculate % of features unique to a sample

MoseleyBioinformaticsLab / visualizationQualityControl

Visualization methods for omics dataset quality control

Other

9 stars 5 forks source link

calculate % of features unique to a sample #9

Open rmflight opened 8 years ago

rmflight commented 8 years ago

It seems that another indicator of problems would be the percentage of all of the features that are sample specific.

For example, if a sample has a large number of features that are only in that sample and no other, then we expect there could be a problem.

rmflight commented 8 years ago

@jesudk2, it looks like we had this issue with a sample in one of our analyses that we could test this on, correct?

jesudk2 commented 8 years ago

yes there is one sample that has features that are sample specific. I can check against that sample and see what percentage there is.

rmflight commented 8 years ago

The idea is that I'll write a function in this pkg to do the calculations. I just wanted to confirm that we had an example in a data set to test this with. If you want to do an initial calculation to see if this metric might even be useful, that would be good.

jesudk2 commented 8 years ago

I will work on that today. I think it will be extremely useful though considering the blinded validation studies being performed. The more information available for QA/QC and potential confounding issues of samples will be good to have don't you think?

On Wed, Apr 13, 2016 at 3:46 PM, Robert M Flight notifications@github.com wrote:

The idea is that I'll write a function in this pkg to do the calculations. I just wanted to confirm that we had an example in a data set to test this with. If you want to do an initial calculation to see if this metric might even be useful, that would be good.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/rmflight/visualizationQualityControl/issues/9#issuecomment-209618884

rmflight commented 8 years ago

Yes, I think so. I'm also interested to see how this sample truly compares to the others, and how useful this metric will actually be. We have an indication of it given how many features disappear when we remove that sample from consideration at the beginning, but this calculation actually gives us a firm number / fraction, and we can see if it really is an outlier with respect to this value.