Open Donandrade opened 7 months ago
The reference has to be a single haplotype. So if you have a diploid sample you want to use as a reference, you have to choose one haplotype for your reference and, effectively, treat the other haplotype(s) as a different sample. This has been annoying me and, I'm sure, others, but I don't really have a solution in mind. It doesn't affect the alignmenet at all -- just the path names in your final graph. So I think what you've done for this part is fine.
I think I am also confused by your figure. Is it for a single chromosome? Is the reference really half the length of your other input genomes? It is possible to have a large insertion relative to the reference, but normally (due to how minigraph works) this would not be at the end of the sequence, nor would it be so large.
@glennhickey Thank you very much for your answer ad attention. You are really making a brilliant work here, with Minigraph-cactus.
Regarding your questions:
- I also find myself confused by your figure. Is it for a single chromosome?
Yes, in this case, I ran the cactus-pangenome
command with the option --viz
. From what I understood, with this option, a 1-dimensional visualization using odgi will be generated for each chromosome in separate figures. The figure I showed here before (Figure 1) is for one of the chromosomes, but the profile you seen is similar for all other plots, including all the rest of haplotypes on the same genome. Therefore, I understanding that this is not due to a large insertion.
I'm not displaying all results here, but in all the tests I conducted, my reference (single haplotype) ends up being smaller than the other genomes. Moreover, it seems that this difference increases as I increase the number of genomes (Figure 2), as seen in the example below with 10 more genomes (28 in total, including the reference). Figure 2. 1-dimensional visualization using odgi with more genomes.
- "Is the reference really half the length of your other input genomes?"
It seems so, but this size increases with the addition of new genomes.
Very mysterious. I suspect it's a problem with how cactus-pangenome
is making the plots. I definitely recommend checking your sequence lengths -- you can use odgi paths -lL
. Also, if you can share a chromosome odgi file, I'll take a closer look here.
Hi @glennhickey !
Thank you for your suggestion. Sorry for the delay. I tried rerunning the MC using the --chrom-og
option to get the .og files, but I failed twice, encountering an error. In the end, I tried to use the GFA file to convert it to an .og file, but I had a core dump error. I'm trying to solve these errors, but if the GFA file could be useful, I'm sending it here.
Thank you for your time and help.
Hi @glennhickey !
I hope you are doing well. I'm revisiting this issue, and after trying other tools, MC still seems to be the best fit for my dataset. Furthermore, I would like to resolve this problem, and your help will be essential. I’m attaching a file with the path and the *.og file.
To update you on what I’ve done and to verify if the information I’m providing is accurate, here’s what I did:
1. Converted the rGFA for one of the chrom-subproblems using the command: vg convert -f -W VaccDscaff1.gfa > VaccDscaff1_2.gfa
.
2. Obtained the *.og file using the command: odgi build -t 10 -g VaccDscaff1_2.gfa -o VaccDscaff1.og
.
3. Obtained the path file using the command: odgi paths -lL -i VaccDscaff1.og > VaccDscaff1.path
.
The *og
and *.path
files are here:
og_path_files.zip
Dear all,
I'm here for the first time in the community and have been trying for some time to obtain a pangenome to use as a reference for mapping short reads using
vg giraffe
and subsequent variant calling usingvg call
(I'm considering usingvg surject
+Deepvariant
). Below, I will outline my questions regarding the main data used, the results obtained (Figure 1), and the command executed to obtain my pangenome using Minigraph-cactus:Data Used: I'm working with genomes of a plant species. I'm using data from two phased genomes (one diploid and one tetraploid) and 11 WGS from other individuals of the species (Figure 1).
Command Executed: I executed the command below to run Minigraph-cactus and obtain the pangenome for the plant in question:
cactus-pangenome jobStorePath14 seqFile --outDir pg14 --outName pg14 --reference REF --vcf --giraffe --gfa --gbz --viz --mgCores 30 --mgMemory 250Gi --mapCores 30 --consCores 30 --consMemory 250Gi --indexCores 30 --indexMemory 250Gi
As a reference (REF), I used one of the haplotypes of the tetraploid genome (Haplotype1 - see Figure 1 below) from "Phased Genome X" (Figure 1 below). The other haplotypes of "Phased Genome X" were used as samples to be incorporated into the pangenome.
Figure 1: 1D representation of the pangenome obtained from 11 WGS (WGS 1, WGS 2, WGS 3, WGS 4, WGS 5, WGS 6, WGS 7, WGS 8, WGS 9, WGS 10, WGS 11) + phased polyploid genome (Phased Genome X) + phased diploid genome (Phased Genome Y). The blue background highlights the region of the WGSs and haplotypes that map against the reference, while the yellow background corresponds to regions of the WGSs and haplotypes that did NOT map. The bolded genome (Haplotype1) corresponds to the haplotype of Phased Genome X used as a reference in Minigraph-cactus.