ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
526 stars 111 forks source link

Inquiry Regarding Minigraph-Cactus v2.9.1 Pipeline for Rare Genetic Disease Analysis #1502

Closed jinhua2024 closed 1 month ago

jinhua2024 commented 1 month ago

Dear Minigraph-Cactus Development Team,

I am a beginner in pan-genome analysis and am currently using Minigraph-Cactus v2.9.1 for my research project, which investigates the genetic cause of a rare disease in an animal family. The family includes the parents, two affected offspring, and one unaffected offspring. Each individual has undergone HiFi sequencing, and we have obtained a total of 10 haplotype assemblies. It is suspected that inbreeding may have contributed to the disease.

First, I would like to express my sincere thanks for the contributions your team has made to graph-based pan-genome research. The Minigraph-Cactus pipeline has been invaluable in my work. I am currently running the following code for my analysis: singularity exec --bind /assembly/cactus_pangen/family_dsd_2:/assembly/cactus_pangen/family_dsd_2 /assembly/cactus_pangen/cactus:v2.9.1.sif cactus-pangenome /assembly/cactus_pangen/family_dsd_2/jobs /samples_family.txt --outDir /assembly/cactus_pangen/family_dsd_2/ --outName DSD_family_2 --reference ref_pig --workDir /assembly/cactus_pangen/workdir --logFile /assembly/cactus_pangen/family_dsd_2/run3.log --maxCores 60 --maxMemory 300G --vcf --gfa --gbz --odgi --xg --viz --draw --chrom-vg --chrom-og --giraffe --vcfwave --restart

I have a few questions I hope you can help with:

  1. In the results I obtained, is it possible to directly use the pig.vcf.gz file for pathogenic cause analysis of the rare disease? This file contains comprehensive genotype (GT) information for all family members, but it lacks sufficient structural variant (SV) details such as variant type and end position. Would additional steps be needed for a more thorough SV analysis?

  2. When performing SV genotyping using VG tools for WGS data mapping, should I use the pig.gfa.gz file or the pig.d2.gfa.gz file? I recall that pig.gfa.gz is suitable for most VG tools, while pig.d2.gfa.gz may be more appropriate for read mapping. Could you clarify the difference and recommend the best approach?

  3. For the same family data, I also added several additional haplotype assemblies. Previously, I used Minigraph-Cactus v2.7.0 to construct the pangenome and output a VCF file. However, I identified an unreliable variant in the pig.vcf.gz file by visualizing the assembly in IGV(please see figure). I am wondering if there are any major improvements in v2.9.1 that would address such issues, or if I should apply specific filters or settings to avoid these problems in the new analysis. sv

Thank you very much for your time and assistance. Your insights and recommendations would be greatly appreciated as I continue my research.

Best regards, Jinhua