ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
511 stars 112 forks source link

How to make minigraph-cactus to use in pan-transcriptome #959

Closed ld9866 closed 1 year ago

ld9866 commented 1 year ago

Dear developer: We had a very difficult problem recently, and as we know, minigraph-cactus is very valuable in pan-genome construction, but we found it very difficult to use in pan-transcriptome.

We found here "https://github.com/vgteam/vg/wiki/Transcriptomic-analyses" has a "A pangenome graph constructed using the [Minigraph-Cactus pipeline], but the vg mpmap need the "gcsa" and the code did not support.

Besides, we also tried the rpvg pipeline (https://github.com/jonassibbesen/rpvg), but we found it is difficult to do this.

  1. vg autoindex --workflow mpmap -t 4 --prefix vg_rna --ref-fasta ref.fa --vcf primates-pg.vcf.gz --tx-gff ref.gtf Result: warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 551c0d409cc2f3ef48b92e117f9f85b4a3dcbe4a at 1:475 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 11cdf7347abe3bc6f33e28762d9974805ea74508 at 1:477 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for eb631a0aec70e056b21207806ea9b3cdd338aa3a at 7:1647 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] suppressing further missing variant warnings warning: [HaplotypeIndexer::parse_vcf] Found 2824500/0 variants in phasing VCF but not in graph! Do your graph and VCF match?

Therefore, we would like to ask you how we can deal with this problem that makes the minigraph-cactus works perfectly on a pan-transcriptome.

glennhickey commented 1 year ago

As mentioned here, you must not use the VCF output of minigraph-cactus to work with vg.

Use the .gbz output instead. If you dont' have a .gbz file, run cactus-graphmap-join with the --gbz option. Please see the vg transcriptome wiki for how to make transcriptomes with gbz files from minigraph cactus, and report any issues you have to the vg team.