10XGenomics / vartrix

Single-Cell Genotyping Tool
MIT License
185 stars 27 forks source link

About the [ERROR]: The resulting matrix has a sum of 0. #37

Closed WeiZhu1998 closed 4 years ago

WeiZhu1998 commented 4 years ago

Dear developers: Thanks for developing this useful tool for processing single cell variant information. When I used coverage mode and add --umi, I met a problem that [ERROR] The resulting matrix has a sum of 0.[ERROR] The resulting matrix has a sum of 0. Did you use the --umi flag on data without UMIs?.

I went back to check my bam file which come from 10× scRNA and found there is a UB that represent UMI. I was a little confused and hope you can give me suggestion, look forward to your reply! Thanks!

Here is screenshot of my bam file: image

Best, Wei

pmarks commented 4 years ago

Hi @WeiZhu1998 Can you share the full command line you used? Did you get any other log messages? You will get 0 counts if the variants you selected have no confidently mapped reads covering them from barcodes that you selected in the barcodes file. Can you inspect the records mapping to some of your variants to make sure that at least some of them have mapped reads with a CB (cell barcode) tag that matches on of the entries in the barcodes file.

scheloni commented 4 years ago

Hi @WeiZhu1998 and @pmarks, I am encountering the same "issue". Were you able to solve this?

Here I share the command line I am using:

vartrix -v sampleA_noSNP.vcf -b sampleA_Aligned.sortedByCoord.out.bam -f reference_genomes/UCSC/hg19/WholeGenomeFasta/hg19.fa -c sampleA/filtered_gene_bc_matrix/barcodes.tsv -o SampleA_out

I will inspect manually the bam files positions of some variants in few hours.

Best, Stefano

pmarks commented 4 years ago

@scheloni one cause for this is having multi-allelic VCF records. vartrix only handle bi-allelic variants and will return counts of 0 for tri-allelic VCF entries. The other common issue is just not having coverage on the selected SNPs with the selected barcodes.

scheloni commented 4 years ago

@pmarks thanks for your quick reply. I gave a quick look at the single cell bam files with IGV, in positions where I was expecting to find the variant. I noticed that positions are covered by reads, however there are not reads supporting the alternative allele. Is this in line with the error? Should the resulting matrix be updated only if the alternative allele is found or also if there is a read with the "normal" allele? Ps. still have to check if those reads belong to real CBs or to filtered out CBs. Thanks again

scheloni commented 4 years ago

To answer to my second question I have also tried adding --scoring-method coverage --ref-matrix SampleA_out_ref, which should create a matrix with ref alleles, but I get the same message...

[ERROR] The resulting matrix has a sum of 0. Did you use the --umi flag on data without UMIs?

Any hints? Thanks

pmarks commented 4 years ago

Can you post some example variants from your VCF that appear to have good coverage? Another to double check is to make sure your barcodes.tsv file is from the same dataset as the BAM file - if they don't match, you'll have very few or 0 overlapping barcodes.

On Wed, Feb 5, 2020, 3:35 AM scheloni notifications@github.com wrote:

To answer to my second question I have also tried adding --scoring-method coverage --ref-matrix SampleA_out_ref, which should create a matrix with ref alleles, but I get the same message...

[ERROR] The resulting matrix has a sum of 0. Did you use the --umi flag on data without UMIs?

Any hints? Thanks

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/10XGenomics/vartrix/issues/37?email_source=notifications&email_token=AAALGA2CTNT46AOUQ5UHGSDRBKP6PA5CNFSM4KA34OI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK3DNQA#issuecomment-582366912, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAALGA5D3ITAN22EI4TFE6DRBKP6PANCNFSM4KA34OIQ .

scheloni commented 4 years ago

@pmarks here it is!

Screenshot 2020-02-05 at 17 13 01

Let me explain: as you can see there are 3 tracks. The first two tracks correspond to the same sample sequenced twice by scRNA in different experimental conditions, while the third is scDNA (aka scCNV by 10x). I decided to show you as example position chr10: 11367781: it is very well covered, moreover there are also reads carrying the alternative allele (C in blue) in this case, which apparently are not scored by vartrix.... How would you explain this?

ps. obviously that position(chr10: 11367781) is a variant in my vcf from WES. pps. I have also checked the CB of some reads containing the alternative allele: they are in the barcodes.tsv file, so should be real cells...

Thank you again Stefano

pmarks commented 4 years ago

Can you post the line from the VCF file for that variant? Thanks!

On Wed, Feb 5, 2020 at 8:24 AM scheloni notifications@github.com wrote:

ps. obviously that position(chr10: 11367781) is a variant in my vcf from WES.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/10XGenomics/vartrix/issues/37?email_source=notifications&email_token=AAALGA47BYVU5TFBK3GG343RBLR3JA5CNFSM4KA34OI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK4BMGA#issuecomment-582489624, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAALGA24BIPSZKSEHUGB7DDRBLR3JANCNFSM4KA34OIQ .

-- Patrick Marks Senior Director, Computational Biology patrick@10xgenomics.com [image: 10x Genomics] http://10xgenomics.com/ Mobile 650-906-1341

6230 Stoneridge Mall Road Pleasanton, CA 94588-3260 | 10xgenomics.com http://www.10xgenomics.com/

scheloni commented 4 years ago

Sure! I had to create the vcf using bedr (package in R) as I did not have the vcf from the mutation caller but only the bed format.

CHROM POS ID REF QUAL FILTER INFO

. . chr10 11367781 NA T NA NA NA . .

Thanks!

pmarks commented 4 years ago

Hmm, I don't see an ALT field. It should come after the REF field. You definitely will need both a REF and ALT field filled out with a DNA base or sequence (and not an 'NA' or '*') in order for vartrix to work correctly.

On Wed, Feb 5, 2020 at 11:47 PM scheloni notifications@github.com wrote:

Sure! I had to create the vcf using bedr (package in R) as I did not have the vcf from the mutation caller but only the bed format.

CHROM POS ID REF QUAL FILTER INFO

. . chr10 11367781 NA T NA NA NA . .

Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/10XGenomics/vartrix/issues/37?email_source=notifications&email_token=AAALGAZQPJIQG6BCCKMODR3RBO6CPA5CNFSM4KA34OI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK6H34Y#issuecomment-582778355, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAALGAYOWP2IDT7TIWPG3STRBO6CPANCNFSM4KA34OIQ .

-- Patrick Marks Senior Director, Computational Biology patrick@10xgenomics.com name@10xgenomics.com [image: 10x Genomics] http://www.10xgenomics.com/ Office 925 123 4567 | Mobile 408 123 4567 6230 Stoneridge Mall Road Pleasanton, CA 94588-3260 | 10xgenomics.com http://www.10xgenomics.com/

scheloni commented 4 years ago

@pmarks That was the issue as it seems to work with the ALT column, thanks a lot!

Now that I was able to run it, in my log I see:

[INFO] Number of alignments evaluated: 3375 . . [INFO] Number of alignments skipped due to not intersecting variant: 2459

What the "Number of alignments evaluated" is reporting? I am expecting many more reads than 3375 to be evaluated as we sequenced millions of reads..... am I misunderstanding something? How the subset of alignments evaluated are somehow selected?

Many thanks again

pmarks commented 4 years ago

Ok, glad we got the VCF sorted out.

'Number of alignments evaluated' is the number of alignments that span a variant in the VCF. Depending on how many variants you have, this can be a small fraction of the total reads.

'Number of alignments skipped due to not intersecting variant' is the number of alignment records that span a variant, but do not actually cover the position of the SNP, usually due to a spliced alignment.

On Fri, Feb 7, 2020 at 3:24 AM scheloni notifications@github.com wrote:

@pmarks https://github.com/pmarks That was the issue as it seems to work with the ALT column, thanks a lot!

Now that I was able to run it, in my log I see:

[INFO] Number of alignments evaluated: 3375 . . [INFO] Number of alignments skipped due to not intersecting variant: 2459

What the "Number of alignments evaluated" is reporting? I am expecting many more reads than 3375 to be evaluated as we sequenced millions of reads..... am I misunderstanding something? How the subset of alignments evaluated are somehow selected?

Many thanks again

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/10XGenomics/vartrix/issues/37?email_source=notifications&email_token=AAALGAZCPZPPKHAQPXOAT23RBVAIJA5CNFSM4KA34OI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELCTPVI#issuecomment-583350229, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAALGA4Y3E3XVOVXKWH3LUDRBVAIJANCNFSM4KA34OIQ .

-- Patrick Marks Senior Director, Computational Biology patrick@10xgenomics.com name@10xgenomics.com [image: 10x Genomics] http://www.10xgenomics.com/ Office 925 123 4567 | Mobile 408 123 4567 6230 Stoneridge Mall Road Pleasanton, CA 94588-3260 | 10xgenomics.com http://www.10xgenomics.com/

scheloni commented 4 years ago

@pmarks many thanks for you helpful support and clearness!