gjmzajac / vices

VICES (Verify Intensity Contamination from Estimated Sources) is a program that jointly estimates contamination and its sources in genotyping arrays.
2 stars 1 forks source link

Axiom arrays? #1

Open drneavin opened 3 years ago

drneavin commented 3 years ago

Hello @gjmzajac,

I'm wondering if vices can be used with Axiom array outputs or if it only works on Illumina array outputs?

Thanks!

gjmzajac commented 3 years ago

Hi @drneavin, The current VICES implementation only supports Illumina arrays. Can you tell me more about your project and the kind of data you have? It may be possible to code up another solution.

Thanks, Gregory Zajac University of Michigan

Sent from my iPhone

On Feb 11, 2021, at 2:00 AM, drneavin notifications@github.com wrote:  Hello @gjmzajac,

I'm wondering if vices can be used with Axiom array outputs or if it only works on Illumina array outputs?

Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

drneavin commented 3 years ago

Hi @gjmzajac,

We have multiple cell lines that have been expanded for biobanking and we want to make sure they are pure populations and have not had any cross-contamination from other lines. Would VICES work if I could generate similar files from the Axiom analysis software? For example, if I could pull the B allele frequency and the allele calls for each individual and then format similar to the example data provided on the wiki site?

I'm also wondering if the input should be QC filtered data or if it should be the pre-QC filtered data?

Thanks so much for your help! -Drew

gjmzajac commented 3 years ago

Hi Drew, Sounds like a cool project. If you can get the data into the right format, then VICES will be able to analyze it and I would be very curious to know if it produces reasonable estimates. Pre-QC should be fine. Do you have any other signs of contamination, like low call rate, excess heterozygosity, or XXY intensities? Do you have any samples that you know for sure are contaminated as positive controls?

Be aware, however, that "B Allele Frequency" is NOT the frequency of the B allele in the population/sample--it is a normalized measure of the probe intensity based on distance to the centers of the genotype clusters. See https://www.illumina.com/Documents/products/technotes/technote_cytoanalysis.pdf for more details.

Thanks, Greg

On Thu, Feb 11, 2021 at 9:45 PM drneavin notifications@github.com wrote:

Hi @gjmzajac https://github.com/gjmzajac,

We have multiple cell lines that have been expanded for biobanking and we want to make sure they are pure populations and have not had any cross-contamination from other lines. Would VICES work if I could generate similar files from the Axiom analysis software? For example, if I could pull the B allele frequency and the allele calls for each individual and then format similar to the example data provided on the wiki site?

I'm also wondering if the input should be QC filtered data or if it should be the pre-QC filtered data?

Thanks so much for your help! -Drew

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gjmzajac/vices/issues/1#issuecomment-777932630, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD6TWNKMZENBIDYYN3D6EATS6SI37ANCNFSM4XOJ6SFA .

drneavin commented 3 years ago

Hi Greg,

I wanted to follow up on this topic and let you know how the results look with Axiom data. I was able to create a file from the Axiom data that has the same structure as the example files. We don't have any other signs of contamination but we are interested in seeing if there are low levels of contamination that may have been missed by the Axiom suite software. Unfortunately we don't have any positive controls with this dataset but we may include one in our next array submission to double check that the data we are getting from Axiom works well in vices.

However, with the current data, vices doesn't detect any cross-contamination. However, I receive this error when I run vices on our data: error: matrix singular when regressing sample 67 intensities on AF

I'm wondering what this implies and if there is something that I should check to alleviate this error?

In addition, I wanted to note that two of the software dependencies used by vices either cause errors or are no longer available:

Thanks for your help!

-Drew

gjmzajac commented 3 years ago

Hi Drew, Sorry for the late reply. Glad you were able to get it to run! The singular matrix error comes from attempting to invert (X^T X) as a step for fitting regression. This can happen if the AFs and the intensities are equal or all the genotypes are equal, but I have most often gotten it when I have passed all missing or erroneous data for an individual. Are you able to share the first 10 lines of the file for this sample?

Using the newer versions of the libraries should not be an issue. Did you edit the build files to resolve this issue?

Thanks, Greg

On Mon, Mar 8, 2021 at 1:16 AM drneavin @.***> wrote:

Hi Greg,

I wanted to follow up on this topic and let you know how the results look with Axiom data. I was able to create a file from the Axiom data that has the same structure as the example files. We don't have any other signs of contamination but we are interested in seeing if there are low levels of contamination that may have been missed by the Axiom suite software. Unfortunately we don't have any positive controls with this dataset but we may include one in our next array submission to double check that the data we are getting from Axiom works well in vices.

However, with the current data, vices doesn't detect any cross-contamination. However, I receive this error when I run vices on our data: error: matrix singular when regressing sample 67 intensities on AF

I'm wondering what this implies and if there is something that I should check to alleviate this error?

In addition, I wanted to note that two of the software dependencies used by vices either cause errors or are no longer available:

  • armadillo-8.400.0.tar.xz is no longer available from the sourceforge address. I ended up using 8.600.1 instead
  • libStatGen v1.0.14 caused issues on my system that were reported on their issue page. I used v1.0.15 instead

Thanks for your help!

-Drew

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gjmzajac/vices/issues/1#issuecomment-792497942, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD6TWNKXI7Z3XRR2R7JXWW3TCRTR7ANCNFSM4XOJ6SFA .

drneavin commented 3 years ago

Hi Greg,

Yes, I just edited the library versions in the build file which worked well.

For the singularity error, I get the error for all samples. I've attached the top 20 lines of one of the files - let me know if you can see why the samples may all be returning the singularity error. vice_file_head.txt

Thanks again for your help!

Cheers, Drew

gjmzajac commented 3 years ago

Hi Drew, Great, can you submit a pull request on Github so I can update the build files to the current versions of those libraries?

The file you shared with me looks good as far as I can tell without actually running any tests. Did you get any valid output when you did this? Are you running this sample together with all 86 others? Were they all genotyped on the same array, with the same variant list and names? Can you copy all the standard output you get from the terminal?

Are all the samples from distinct individuals? Relatives should be fine but I do not think VICES handles duplicates well.

Thanks, Greg

On Fri, Mar 12, 2021 at 6:44 PM drneavin @.***> wrote:

Hi Greg,

Yes, I just edited the library versions in the build file which worked well.

For the singularity error, I get the error for all samples. I've attached the top 20 lines of one of the files - let me know if you can see why the samples may all be returning the singularity error. vice_file_head.txt https://github.com/gjmzajac/vices/files/6133671/vice_file_head.txt

Thanks again for your help!

Cheers, Drew

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gjmzajac/vices/issues/1#issuecomment-797816494, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD6TWNLPSM6EJNQ2N3DNOITTDKRM7ANCNFSM4XOJ6SFA .

gjmzajac commented 3 years ago

Hi Drew, To facilitate and speed up your own debugging (which can be slow over email), the only place this error can be thrown is https://github.com/gjmzajac/vices/blob/master/src/regression.cpp#L27 here you can print out the values of each of the variables that go into the determinant calculation:

nAA nAB nBB sum_AF_AA sum_AF_AB sum_AF_BB sum_AF_sq

And see more of what is going on.

Thanks, Greg

On Mon, Mar 15, 2021 at 10:41 PM Gregory Zajac @.***> wrote:

Hi Drew, Great, can you submit a pull request on Github so I can update the build files to the current versions of those libraries?

The file you shared with me looks good as far as I can tell without actually running any tests. Did you get any valid output when you did this? Are you running this sample together with all 86 others? Were they all genotyped on the same array, with the same variant list and names? Can you copy all the standard output you get from the terminal?

Are all the samples from distinct individuals? Relatives should be fine but I do not think VICES handles duplicates well.

Thanks, Greg

On Fri, Mar 12, 2021 at 6:44 PM drneavin @.***> wrote:

Hi Greg,

Yes, I just edited the library versions in the build file which worked well.

For the singularity error, I get the error for all samples. I've attached the top 20 lines of one of the files - let me know if you can see why the samples may all be returning the singularity error. vice_file_head.txt https://github.com/gjmzajac/vices/files/6133671/vice_file_head.txt

Thanks again for your help!

Cheers, Drew

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gjmzajac/vices/issues/1#issuecomment-797816494, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD6TWNLPSM6EJNQ2N3DNOITTDKRM7ANCNFSM4XOJ6SFA .

drneavin commented 3 years ago

Hi Greg,

Great questions! yes, I can put through a pull request to update the libraries.

I'm glad to hear that the file looks good from first glance. I've attached the output from the analysis - all the individuals have nan as the output values: contam_estimates.txt. This is the standard output I get from running vices:

VICES v1.0: Verify Intensity Contamination from Estimated Sources
(c) 2019 - Gregory Zajac, Goncalo Abecasis
Reading 87 report files...
Done
Read 370450 Markers in the report files
Using 208909 markers with MAF >= 0.1
error: matrix singular when regressing sample 0 intensities on AF
error: matrix singular when regressing sample 1 intensities on AF
error: matrix singular when regressing sample 2 intensities on AF
error: matrix singular when regressing sample 3 intensities on AF
error: matrix singular when regressing sample 4 intensities on AF
error: matrix singular when regressing sample 5 intensities on AF
error: matrix singular when regressing sample 6 intensities on AF
error: matrix singular when regressing sample 7 intensities on AF
error: matrix singular when regressing sample 8 intensities on AF
error: matrix singular when regressing sample 9 intensities on AF
error: matrix singular when regressing sample 10 intensities on AF
error: matrix singular when regressing sample 11 intensities on AF
error: matrix singular when regressing sample 12 intensities on AF
error: matrix singular when regressing sample 13 intensities on AF
error: matrix singular when regressing sample 14 intensities on AF
error: matrix singular when regressing sample 15 intensities on AF
error: matrix singular when regressing sample 16 intensities on AF
error: matrix singular when regressing sample 17 intensities on AF
error: matrix singular when regressing sample 18 intensities on AF
error: matrix singular when regressing sample 19 intensities on AF
error: matrix singular when regressing sample 20 intensities on AF
error: matrix singular when regressing sample 21 intensities on AF
error: matrix singular when regressing sample 22 intensities on AF
error: matrix singular when regressing sample 23 intensities on AF
error: matrix singular when regressing sample 24 intensities on AF
error: matrix singular when regressing sample 25 intensities on AF
error: matrix singular when regressing sample 26 intensities on AF
error: matrix singular when regressing sample 27 intensities on AF
error: matrix singular when regressing sample 28 intensities on AF
error: matrix singular when regressing sample 29 intensities on AF
error: matrix singular when regressing sample 30 intensities on AF
error: matrix singular when regressing sample 31 intensities on AF
error: matrix singular when regressing sample 32 intensities on AF
error: matrix singular when regressing sample 33 intensities on AF
error: matrix singular when regressing sample 34 intensities on AF
error: matrix singular when regressing sample 35 intensities on AF
error: matrix singular when regressing sample 36 intensities on AF
error: matrix singular when regressing sample 37 intensities on AF
error: matrix singular when regressing sample 38 intensities on AF
error: matrix singular when regressing sample 39 intensities on AF
error: matrix singular when regressing sample 40 intensities on AF
error: matrix singular when regressing sample 41 intensities on AF
error: matrix singular when regressing sample 42 intensities on AF
error: matrix singular when regressing sample 43 intensities on AF
error: matrix singular when regressing sample 44 intensities on AF
error: matrix singular when regressing sample 45 intensities on AF
error: matrix singular when regressing sample 46 intensities on AF
error: matrix singular when regressing sample 47 intensities on AF
error: matrix singular when regressing sample 48 intensities on AF
error: matrix singular when regressing sample 49 intensities on AF
error: matrix singular when regressing sample 50 intensities on AF
error: matrix singular when regressing sample 51 intensities on AF
error: matrix singular when regressing sample 52 intensities on AF
error: matrix singular when regressing sample 53 intensities on AF
error: matrix singular when regressing sample 54 intensities on AF
error: matrix singular when regressing sample 55 intensities on AF
error: matrix singular when regressing sample 56 intensities on AF
error: matrix singular when regressing sample 57 intensities on AF
error: matrix singular when regressing sample 58 intensities on AF
error: matrix singular when regressing sample 59 intensities on AF
error: matrix singular when regressing sample 60 intensities on AF
error: matrix singular when regressing sample 61 intensities on AF
error: matrix singular when regressing sample 62 intensities on AF
error: matrix singular when regressing sample 63 intensities on AF
error: matrix singular when regressing sample 64 intensities on AF
error: matrix singular when regressing sample 65 intensities on AF
error: matrix singular when regressing sample 66 intensities on AF
error: matrix singular when regressing sample 67 intensities on AF
error: matrix singular when regressing sample 68 intensities on AF
error: matrix singular when regressing sample 69 intensities on AF
error: matrix singular when regressing sample 70 intensities on AF
error: matrix singular when regressing sample 71 intensities on AF
error: matrix singular when regressing sample 72 intensities on AF
error: matrix singular when regressing sample 73 intensities on AF
error: matrix singular when regressing sample 74 intensities on AF
error: matrix singular when regressing sample 75 intensities on AF
error: matrix singular when regressing sample 76 intensities on AF
error: matrix singular when regressing sample 77 intensities on AF
error: matrix singular when regressing sample 78 intensities on AF
error: matrix singular when regressing sample 79 intensities on AF
error: matrix singular when regressing sample 80 intensities on AF
error: matrix singular when regressing sample 81 intensities on AF
error: matrix singular when regressing sample 82 intensities on AF
error: matrix singular when regressing sample 83 intensities on AF
error: matrix singular when regressing sample 84 intensities on AF
error: matrix singular when regressing sample 85 intensities on AF
error: matrix singular when regressing sample 86 intensities on AF
Initial estimates based on AFs complete
Contamination above 0.005 detected in 0 samples
Starting donor search...
Done
Pruning estimated donors and calculating final estimates...
Done
Results written to /directflow/SCCGGroupShare/projects/DrewNeavin/SNP_genotyping_contamination/2020_02/output/vices/contam_estimates.txt

Yes, all the individuals were genotyped on the same array on the same plate so the variant names are the same.

However, you bring up a good point - some of the individuals have been included twice in this genotype array submission. I just ran vices with unique individuals and got the same error so I don't think that is the reason for this issue.

I'll print out all the variables and get back to you about that soon.

Thanks! -Drew

drneavin commented 3 years ago

Hi Greg,

I figured out why my files were all returning the singular matrix error. The files must be dos files to be effectively parsed by vices. If I run unix2dos on the files that I generated (which were unix files) before running vices, the program works well and returns the expected results without error.

I'll do the pull request today to update the software packages.

Thanks for all your help on this issue! And good to know now that vices works on Axiom arrays too.

-Drew

gjmzajac commented 3 years ago

Hi Drew, I am really glad to hear you got it to work! The report files I had from Illumina all had the /r/n line endings so I will try to fix that bug so it can run with just \n.

Thanks, Greg

On Wed, Mar 17, 2021 at 5:43 PM drneavin @.***> wrote:

Hi Greg,

I figured out why my files were all returning the singular matrix error. The files must be dos files to be effectively parsed by vices. If I run unix2dos on the files that I generated (which were unix files) before running vices, the program works well and returns the expected results without error.

I'll do the pull request today to update the software packages.

Thanks for all your help on this issue! And good to know now that vices works on Axiom arrays too.

-Drew

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gjmzajac/vices/issues/1#issuecomment-801462487, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD6TWNK35EKFYCCSUJGLNMTTEEO7DANCNFSM4XOJ6SFA .