Closed GeorgeBGM closed 11 months ago
We don't yet have a way to add new samples (or replace existing samples) from the pangenome.
The only type of merging that I'm aware of being possible is merging different chromosomes from the same species, which vg combine
and odgi squeeze
can do. Merging this way is already done as part of the minigraph-cactus
pipeline, which aligns chromosomes separately and then merges them at the end into whole-genome indexes.
Thank you for your reply.
I would like to ask if there is a proposal about directly merging constructed pan-genomic of the same chromosome from the same species or do I need to start from the beginning? How much time did it take to construct the human HPRC Phase I pan-genome?
There are some running times in the paper. They use an aws cluster for some steps. On a single machine, you're probably looking at about 2 weeks running time.
Can I use minigraph and vg map to add new genomes to the pan-genome published by HPRC? Our tests found that the minigraph alignment output GFA file is smaller than the HPRC project published file, can vg map do the alignment of the whole genome? Are there any suggestions about this issue?
Yes, you can add genomes with minigraph
(to minigraph pangenomes) with -cxggs
.
For minigraph-cactus, all following steps would need to be rerun.
vg map
and vg giraffe
will not be able to map genome assemblies. GraphAligner may work.
Hi, I tried to use GraphAligner(_GraphAligner -g CPC.HPRC.Phase1.CHM13v2_Non-W.gfa -f /HJ.stLFRCCS.maternal.fasta.gz -a aln.gam -x vg), But I got several errors message.Can you give me some suggestions.
Please report your GraphAligner issues here: https://github.com/maickrau/GraphAligner/issues/new
Hi, My task was forced out due to exceeding the node time limit (3 days), is there some potential risk for the following task after I restart it using option --restart(cactus-pangenome command).
There should be no risk to trying --restart
.
Thank you for your reply.
Hi, I would like to reproduce the construction process of the HPRC project using the step-by-step cactus process. After downloading the data of the HPRC project, I am going to use the following steps to complete the construction of the pan-genome.
Step1: cactus-minigraph Step2: cactus-preprocess (brnn) Step3:cactus-graphmap Step4:cactus-graphmap-split Step5:cactus-align Step6:cactus-graphmap-join
I want to confirm again if the above steps are in the right order.Besides, I got the following error while doing Step2, do I need to replace the # character in the FASTA sequence.Are there some other suggestions.
The specific error reported is as follows: RuntimeError: An invalid character was found in the first word of a fasta header. Acceptable characters for headers in an assembly hub include alphanumeric characters plus '', '-', ':', and '.'. Please modify your headers to eliminate other characters. The offending header: 'HG005#1#JAHEPO010000001.1' in 'HG005.1' RuntimeError: An invalid character was found in the first word of a fasta header. Acceptable characters for headers in an assembly hub include alphanumeric characters plus '', '-', ':', and '.'. Please modify your headers to eliminate other characters. The offending header: 'HG01071#1#JAHBCF010000001.1' in 'HG01071.1'
You can resolve that error by running cactus-preprocess --pangenome
on the input data as a first step to remove the # characters.
Note: The year-2 hprc graph will be made with a single invocation of cactus-pangenome
, and that's how I recommend building graphs with the current version of cactus. If you really want to exactly reproduce the released graph, please carefully look in the papers and use the cactus commits and commands there (but again, you will get better results using the latest release and interface).
Thank you for your reply.
I would like to ask if the cactus-pangenome contains a removal step for complex regions(cactus-preprocess (brnn)).
Recently,I read article Construction and representation of human pangenome graphs, which evaluated different pan-genome construction tools(Bifrost, mdbg, Minigraph, Minigraph-Cactus and pggb), have you tested them internally in your team and which software is recommended to try in addition to Minigraph-Cactus and pggb?
Besides, Do you have any suggestions about this error(https://github.com/maickrau/GraphAligner/issues/83).
dna-brnn
was removed from the default pipeline in Cactus Version v2.1.0. Since then, alignment gaps are used to remove complex sequence. The difference between the two approaches is touched on in the minigraph cactus paper.
That article is still on my to-read list, so I can't comment on it yet.
I don't have any suggestions for your GraphAligner error, sorry.
Thank you for your reply.
I will continue to follow the progress related to pan-genome and hope you will share more.
Dear @glennhickey, Is there a feasible process to add 1000 Genomes (SNPs; Indels; SVs) to the GFA pan-genome file generated by The Minigraph-Cactus Pangenome Pipeline? Can vg autoindex be used to do this?
Best, Du
Nope, can't be done with vg
.
Hi, I have the following questions to ask: