10XGenomics / vartrix

Single-Cell Genotyping Tool
MIT License
185 stars 27 forks source link

Is there any way to speed the program? #35

Closed YiweiNiu closed 4 years ago

YiweiNiu commented 4 years ago

Hi,

Thank you for developing such a useful tool!

I am using vartrix to count alleles for whole mitochondria as this issue.

But I found it very slow. I ran the tool a week ago and had not got any output. I wonder is there any way that I can make it faster?

The BAM I tested has 630,748,983 reads, and 6,385 cells were in this sample. The dummy VCF includes 49708 variants.

Here is my command and the log

PPN=20
INPUT_DIR=$WORKDIR/cellranger/$dataset/${sample}_5/outs
$TOOLDIR/vartrix.1.1.8/vartrix_linux --log-level info --threads $PPN --primary-alignments -s coverage --umi -v $chrM_VCF -f $chrM_FASTA -b $INPUT_DIR/possorted_genome_bam.bam -c $INPUT_DIR/filtered_feature_bc_matrix/barcodes.tsv.gz --out-variants $OUT_DIR/variants.txt -o $OUT_DIR/n_alt_reads.mtx --ref-matrix $OUT_DIR/n_ref_reads.mtx

The log

Start time is 2019/11/22--14:06
06:07:00 [INFO] Initialized a 49708 variants x 6385 cell barcodes matrix

Any suggestions or comments would be appreciated.

Bests, Yiwei

pmarks commented 4 years ago

Hi @YiweiNiu, we are looking for some ways to make it faster in the future, but we have not implemented these yet. The easiest thing to do it use a smaller VCF that only includes variants which you expect to exist in the sample, if possible. Can you tell me a bit more about the VCF you are using? Does it have a large number of 'hypothetical' variants densely packed over the chrM sequence?

YiweiNiu commented 4 years ago

Thank you for your reply!

The easiest thing to do it use a smaller VCF that only includes variants which you expect to exist in the sample, if possible.

I would like to do that, but I have not found any simple and effective way to do. I would try freebayes to call variants on the chrM. Do you have any suggestions about this?

Can you tell me a bit more about the VCF you are using? Does it have a large number of 'hypothetical' variants densely packed over the chrM sequence?

Yes. I made a dummy VCF which included all possible SNVs of each position like this

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE
MT      1       .       G       A       .       .       .       GT      ./.
MT      1       .       G       T       .       .       .       GT      ./.
MT      1       .       G       C       .       .       .       GT      ./.
MT      2       .       A       T       .       .       .       GT      ./.
MT      2       .       A       C       .       .       .       GT      ./.
MT      2       .       A       G       .       .       .       GT      ./.
MT      3       .       T       A       .       .       .       GT      ./.
MT      3       .       T       C       .       .       .       GT      ./.
MT      3       .       T       G       .       .       .       GT      ./.
MT      4       .       C       A       .       .       .       GT      ./.
MT      4       .       C       T       .       .       .       GT      ./.
MT      4       .       C       G       .       .       .       GT      ./.
MT      5       .       A       T       .       .       .       GT      ./.
MT      5       .       A       C       .       .       .       GT      ./.
pmarks commented 4 years ago

Yes, I think trying freebayes is a good idea. You should be able to tune the settings of Freebayes so that you get variant calls even if the number of observed reads is very small. That VCF should have a much smaller number of variants which will let vartrix run much faster.

On Sun, Dec 1, 2019 at 5:26 PM Yiwei Niu notifications@github.com wrote:

Thank you for your reply!

The easiest thing to do it use a smaller VCF that only includes variants which you expect to exist in the sample, if possible.

I would like to do that, but I have not found any simple and effective way to do. I would try freebayes to call variants on the chrM. Do you have any suggestions about this?

Can you tell me a bit more about the VCF you are using? Does it have a large number of 'hypothetical' variants densely packed over the chrM sequence?

Yes. I made a dummy VCF which included all possible SNVs of each position like this

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE

MT 1 . G A . . . GT ./. MT 1 . G T . . . GT ./. MT 1 . G C . . . GT ./. MT 2 . A T . . . GT ./. MT 2 . A C . . . GT ./. MT 2 . A G . . . GT ./. MT 3 . T A . . . GT ./. MT 3 . T C . . . GT ./. MT 3 . T G . . . GT ./. MT 4 . C A . . . GT ./. MT 4 . C T . . . GT ./. MT 4 . C G . . . GT ./. MT 5 . A T . . . GT ./. MT 5 . A C . . . GT ./.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/10XGenomics/vartrix/issues/35?email_source=notifications&email_token=AAALGA2UINWSXPR7H725ZC3QWRP57A5CNFSM4JS3OO22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFR4ZRQ#issuecomment-560188614, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAALGA6RFMWRUHBWA7HABCLQWRP57ANCNFSM4JS3OO2Q .

-- Patrick Marks Senior Director, Computational Biology patrick@10xgenomics.com [image: 10x Genomics] http://10xgenomics.com/ Mobile 650-906-1341

6230 Stoneridge Mall Road Pleasanton, CA 94588-3260 | 10xgenomics.com http://www.10xgenomics.com/

YiweiNiu commented 4 years ago

I would try freebayes then. Thank you for your help!

Zifeng-L commented 2 years ago

I would try freebayes then. Thank you for your help!

Hi, can you help me with freebayes? I tried this tool but it could not work. Here is my command and the log.

freebayes -f ./refdata-cellranger-GRCh38-3.0.0/fasta/genome.fa -r MT ./T187L_possorted_genome_bam.bam >./T187L.vcf

the log

[E::idx_find_and_load] Could not retrieve index file for 'T187L_possorted_genome_bam.bam'
Failed to load index for T187L_possorted_genome_bam.bam. Rebuild samtools index
ERROR(freebayes): Could not SetRegion to MT:0..16569
pmarks commented 2 years ago

@Zifeng-L it looks like you need to index your bam file: samtools index T187L_possorted_genome_bam.bam should create the file T187L_possorted_genome_bam.bam.bai, which let freebayes run.

Zifeng-L commented 2 years ago

@Zifeng-L it looks like you need to index your bam file: samtools index T187L_possorted_genome_bam.bam should create the file T187L_possorted_genome_bam.bam.bai, which let freebayes run.

Got it! I have resolved it! Thank you very much!