ShunOuchi / GreenHill

De novo chromosome-level scaffolding and phasing tool using Hi-C
GNU General Public License v3.0
27 stars 2 forks source link

[General question] Which assembler to use prior to GreenHill ? #23

Closed Isoris closed 11 months ago

Isoris commented 11 months ago

Hello, thank you for the tool,

I am currently working on the assembly of a catfish genome. In short, with Nanopore reads, HiC reads, 40X of PacBio Hifi and 90X illlumina PE, would it be preferable to use Platanus-allee, FalconUNzip, Hiphasm or HiphasmHiC(without GreenHill)? Also the sequencing is not trio and fully denovo (no refs available for our species).

From my understanding, Platanus-allee+GreenHill, FalconUNzip+GreenHill, seems to be the best, is it possible to merge the two assemblies by scaffolding with minimap2?

Thank you for your answer. Quentin

ShunOuchi commented 11 months ago

Hello, @Isoris

Would it be preferable to use Platanus-allee, FalconUNzip, Hiphasm or HiphasmHiC(without GreenHill)?

I recommend trying hifiasm UL mode and Hi-C mode using HiFi, Nanopore and Hi-C reads. hifiasm --h1 HiC1.fastq --h2 HiC2.fastq --ul ONT.fastq HiFi.fastq Then, please map Hi-C reads to the results (hap1.p_ctg + hap2.p_ctg) and check the Hi-C contact map, using tools such as juicer, juicebox. If not generated the chromosome-level assembly, please perform Hi-C scaffolding and phasing with GreenHill using hap1 + hap2 as input. greenhill -cph out.hap1.p_ctg.fa out.hap2.p_ctg.fa -p HiFi.fastq -IP1 PE1.fastq PE2.fastq -HiC HiC1.fastq HiC2.fastq

From my understanding, Platanus-allee+GreenHill, FalconUNzip+GreenHill, seems to be the best, is it possible to merge the two assemblies by scaffolding with minimap2?

It is difficult to merge the results of Platanus-allee + GreenHill and FALCON-Unzip + GreenHill.

Thank you Shun

Isoris commented 11 months ago

Hello Actually I have PacBio CLR 40x , nanopore 25 x and illumina 90Xand HiC

ShunOuchi commented 11 months ago

I would recommend you try several assemblers and compare the results. In this case, Platanus-allee(with PE, CLR, and ONT) may be better because it may be difficult to separate haplotypes with FALCON-Unzip due to the low coverage of long reads.

Isoris commented 11 months ago

Hello, @Isoris

Would it be preferable to use Platanus-allee, FalconUNzip, Hiphasm or HiphasmHiC(without GreenHill)?

I recommend trying hifiasm UL mode and Hi-C mode using HiFi, Nanopore and Hi-C reads. hifiasm --h1 HiC1.fastq --h2 HiC2.fastq --ul ONT.fastq HiFi.fastq Then, please map Hi-C reads to the results (hap1.p_ctg + hap2.p_ctg) and check the Hi-C contact map, using tools such as juicer, juicebox. If not generated the chromosome-level assembly, please perform Hi-C scaffolding and phasing with GreenHill using hap1 + hap2 as input. greenhill -cph out.hap1.p_ctg.fa out.hap2.p_ctg.fa -p HiFi.fastq -IP1 PE1.fastq PE2.fastq -HiC HiC1.fastq HiC2.fastq

From my understanding, Platanus-allee+GreenHill, FalconUNzip+GreenHill, seems to be the best, is it possible to merge the two assemblies by scaffolding with minimap2?

It is difficult to merge the results of Platanus-allee + GreenHill and FALCON-Unzip + GreenHill.

Thank you Shun

Hello again, finally it seems that I have HiFI reads, I would like to know if it is preferable to use

  1. Hifi only assembly with Hifasm + Greenhill scaffolding
  2. Hifasm UL + GreenHill scaffolding
  3. Hifasm UL + HiC without GreenHill scaffolding

best regards, Quentin

ShunOuchi commented 11 months ago

Hello again

Hello again, finally it seems that I have HiFI reads, I would like to know if it is preferable to use

  1. Hifi only assembly with Hifasm + Greenhill scaffolding
  2. Hifasm UL + GreenHill scaffolding
  3. Hifasm UL + HiC without GreenHill scaffolding

There is no problem with 1, 2, and 3. However, 3 might not generate chromosome-level assembly because Hifiasm does not have a Hi-C Scaffolding function.

Thank you, Shun

Isoris commented 11 months ago

Hello again

Hello again, finally it seems that I have HiFI reads, I would like to know if it is preferable to use

  1. Hifi only assembly with Hifasm + Greenhill scaffolding
  2. Hifasm UL + GreenHill scaffolding
  3. Hifasm UL + HiC without GreenHill scaffolding

There is no problem with 1, 2, and 3. However, 3 might not generate chromosome-level assembly because Hifiasm does not have a Hi-C Scaffolding function.

Thank you, Shun

So the best option (although empirical) would be to do:

  1. Hifasm UL + HiC + GreenHill scaffolding

(from your paper and the figure 5 of your paper from my understanding Hifasm produces the most conservative assemblies but it is not specified if adding HiC or UL to Hifasm could infact improve the quality of the chromosome level assembly after GreenHill.

Regards, Quentin

https://media.springernature.com/lw685/springer-static/image/art%3A10.1186%2Fs13059-023-03006-8/MediaObjects/13059_2023_3006_Fig5_HTML.png?

ShunOuchi commented 11 months ago

So the best option (although empirical) would be to do:

  1. Hifasm UL + HiC + GreenHill scaffolding

Yes, but we don't perform benchmarking using UL, so it is based on my experience.

Isoris commented 11 months ago

s, but we don't perform benchmarking using UL, so it is based on my experience.

Okay thank you very much !! :)