Closed zhangyixing3 closed 2 years ago
Sorry for the late reply. Here is a pretty nice example if you would like to work on polyploid genome: https://github.com/baozg/Potato_C88.
Thank you for your help.It seems really difficult to answer this question.
Sorry to bother you again., I can't undersatnd Polyploidy graph binning of hifiasm. In Autotetraploid potato paper,they use below command .but they have no additional explanation.
hifiasm -t 64 -o C88 -5 C88.hifiasm.binutg.reads.list --n-hap 4 --hom-cov 120 C88.HiFi.fa.gz
# -5
in Hifiasm Parameter Reference , I can't find the description of this parameter.Can you give me some help? Thank you!
I think we have put a brief introduction for -5
, which is enough for others to run this version with their own data (Actually, it did work in other species!). I do put the result which I use in our tetraploid potato project. The first was the diferent haplotype groups (We also added the linkage group, but bascially hifiasm only use the _1/_2/_3/_4 four groups), the reads name was from utg gfa (non-contianed reads, we binned the utg first, then use all non-contained reads from one untig as haplotype group). Would you mind giving more detail about your question? Since it was a hidden parameters only in the developed branch of hifiasm, so the description didn't added.
-5
represent the phase information (below), the collapsed region will use the reads more than once, --n-hap 4 indicated the ploidy and --hom-cov is the homozyous peak in assembly. Groupnon-contained HiFi reads LG1_1 m64053_200110_120759/100206539/ccs LG1_1 m64053_200110_120759/100270139/ccs LG1_1 m64053_200110_120759/100272825/ccs LG1_1 m64053_200110_120759/100402742/ccs LG1_1 m64053_200110_120759/100467929/ccs LG1_1 m64053_200110_120759/100468612/ccs LG1_1 m64053_200110_120759/100534820/ccs
Your work is excellent. Maybe I'm a beginner , so I need a more detail. I am really interested in the -5
parameter.
I'm curious what is non-contianed reads in GFA? I guess it's the CCS sequence that constitute unitig in gfa. Then hifiasm use this phase information reassemble. Right?
Thank you for your warm-hearted help.
Non-contained reads just directly from the hifiasm output *.p_utg.noseq.gfa
. Bascially, it was based on all-to-all overlap to select the representative reads.
hello, I am grateful to you for produces hifiasm .It is very powerful for simple genome. I have a big autopolyploid plant genome. In order to get all chromosomes ,I must use p_utg.gfa . I will use the figure to illustrate my doubts. In the figure, I can get unitig 1-7 , However, there are unitig 1 2 、unitig 2、unitig 32、unitig 4、unitig 5 * 2、unitig 6、unitig 7 in actual genome,This means thatp_utg.fa is smaller than actual genome, and I have to find a way to replicate the same conitg between haplotypes. I'm not sure if my idea is correct. Can you give me some advice? Thank you!