ShunOuchi / GreenHill

De novo chromosome-level scaffolding and phasing tool using Hi-C
GNU General Public License v3.0
25 stars 2 forks source link

Greenhill takes too long to run #16

Closed jinxin112233 closed 7 months ago

jinxin112233 commented 10 months ago

Hi Ouchi,

We manually completed the installation of greenhill and ran it successfully with test data. We input a 1.5G genome, 40Gb of HiFi reads and 100Gb of Hi-C reads, -t 48, and ran it for about 4 days and still no final results. (The current run log is stuck at 'makeHiCLink; finish makeHiCLink; numHiCNode:100').

Are we setting too few threads or input too much data? How long will this probably take to get final results?

Best wish Jinxin

ShunOuchi commented 10 months ago

Hi Jinxin,

In my environment, it took about 8 hours with zebra finch sample (1 Gb genome, 39Gb of HiFi reads and 95 Gb of Hi-C reads), -t 48. See table S4 in the paper for details.

However, some samples may take longer and may take more than 2 weeks. I think this is due to slow update of link information when contigs are cut, but we have not been able to improve this. We would like to improve it and update in the future.

Thank you Ouchi

jinxin112233 commented 10 months ago

Hi Ouchi, Thanks for your quick reply, we are very interested in greenhill~

According to the fig1 of paper, GreenHill can phasing a single chromosome in the ideal situation. The test results outputted (Canu_afterPhase.fa) by greenhill, each scafold id is preceded by hap1 and hap0 prefixes, does it mean that the hap0 prefixes of the scafold originated from one parents, and the scafold with the hap1 prefix originated from the other parents?

Also, would you be able to provide me with your email address (my email: jinxin@mail.kib.ac.cn), after greenhill outputs the final phased file, we would like to further discuss with you (if the results deserve to be studied in depth) and possibly send you some files.

Best wish Jinxin

ShunOuchi commented 10 months ago

According to the fig1 of paper, GreenHill can phasing a single chromosome in the ideal situation. The test results outputted (Canu_afterPhase.fa) by greenhill, each scafold id is preceded by hap1 and hap0 prefixes, does it mean that the hap0 prefixes of the scafold originated from one parents, and the scafold with the hap1 prefix originated from the other parents?

Yes, GreenHill can phase one chromosome and, in the ideal case, generate two haplotypes. The prefix of each scaffold represents the haplotype of that scaffold. the hap0 prefixes of the scafold originated from one parents, and the scafold with the hap1 prefix originated from the other parents. Note that since there is no phasing between scaffolds, even if the prefixes of scaffolds are the same at hap0, it is not guaranteed that they are from the same parent.

Also, would you be able to provide me with your email address (my email: jinxin@mail.kib.ac.cn), after greenhill outputs the final phased file, we would like to further discuss with you (if the results deserve to be studied in depth) and possibly send you some files.

You can contact me at "oouchi.s.aa[at]m.titech.ac.jp".

Thank you

pmoulos commented 9 months ago

Hi @jinxin112233,

Did GreenHill finish for you? I am facing the same issue (~1Gb genome, ~70Gb HiC data) and it has been running for 4 days and I am wondering if all is OK. @ShunOuchi thanks for the fantastic work!

Regards

jinxin112233 commented 9 months ago

Hi pmoulos,

I think it is OK. We input a 1.5G genome, 40Gb of HiFi reads and 100Gb of Hi-C reads, -t 48, and ran it about 15 days. Now we have got the final output and evaluate it using other method.

Best wish Jinxin

github-actions[bot] commented 7 months ago

Stale issue message