Closed ld9866 closed 1 year ago
Right now your choices are
PanGenie
or vg call
)To make a combined VCF, you would need to carefully merge the two VCFs from above.
You can, in theory, map your reads to the graph and add back the variants with vg augment -m
but this won't give you a VCF and may introduce lots of noise, so I don't really recommend it.
Thank you for your patient reply, but I have one more question. We got the vcf file using minigraph-cactus build, but it takes a long time to do vg auto index, and now it's been a few days and still not done, and we have 15 genome files that are similar in size. Code:vg autoindex --workflow mpmap -t 4 --prefix vg_rna --ref-fasta example_data/x.fa --vcf example_data/x.vcf.gz --tx-gff example_data/x.gtf Best wishes!
You definitely don't want to go GRAPH->VCF->GRAPH. If you want to re-index the results of minigraph-cactus, you should start with the GFA or GBZ, not the VCF.
Thank you for your reply! We now want to use rpvg to explore the pan-transcriptome study, so we want to build the index file before we start the subsequent analysis. From the introduction document of rpvg, we found that the first step is to build the index file, and we want to build it and then compare the transcriptome data to the pan-genome. How do we do that? The example code:
The easiest way to start this pipeline is to use the vg autoindex
subcommand to make indexes for vg mpmap
. vg autoindex
creates indexes for mapping from common interchange formats like FASTA, VCF, and GTF. It effectively combines the vg rna
step and the indexing for vg mpmap
.
More information is available in the wiki page on transcriptomics.
Working from this directory, the following example shows how to create a spliced pangenome graph and indexes using vg autoindex
with 4 threads:
# Create spliced pangenome graph and indexes for vg mpmap
vg autoindex --workflow mpmap -t 4 --prefix vg_rna --ref-fasta example_data/x.fa --vcf example_data/x.vcf.gz --tx-gff example_data/x.gtf
This will create several files with the prefix vg_rna
, which can be used in rpvg
and vg mpmap
.
RNA-seq reads can be mapped to the spliced pangenome graph using vg mpmap
with 4 threads:
# Map simulated RNA-seq reads using vg mpmap
vg mpmap -t 4 -x vg_rna.spliced.xg -g vg_rna.spliced.gcsa -d vg_rna.spliced.dist -f example_data/x_rna_1.fq -f example_data/x_rna_2.fq > mpmap.gamp
This will create a multipath alignment file called mpmap.gamp
.
vg autoindex
can accept gfa.
Thank you for your reply Best wishes
Hello developers! I was building a 15 genome pan-genome using minigraph-cactus and got the vcf file without a problem. Here, I want to add short sequenced data from 500 individuals to our vcf file to build a more complete pan-genome for the following analysis. What should I do? Best yours.