ShunOuchi / GreenHill

De novo chromosome-level scaffolding and phasing tool using Hi-C
GNU General Public License v3.0
25 stars 2 forks source link

compile error #4

Closed xiekunwhy closed 11 months ago

xiekunwhy commented 2 years ago

Hi,

I got some errors when compiling the GreenHill source code. My g++/gcc version is: gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC)

Here are the errors I got, do you know what's wrong? compile.log.txt

And I want to use conda version instead, but I always Segmentation fault. Here are the conda version log, I don't know what's wrong. source /Bio/User/software/anaconda3/bin/activate greenhill;greenhill -cph lja.fa -o lja.scaffold -t 30 -p Lja.subreads.fasta -HIC hic_1.fq.gz hic_2.fq.gz greenhill.sh.log.txt

Best, Kun

xiekunwhy commented 2 years ago

The compiled version Segmentation fault too. So I think there is a bug in greenhill when giving Mixed-haplotype style input.

ShunOuchi commented 2 years ago

lja.scaffold_minced_nonBubble.fa, lja.scaffold_minced_primaryBubble.fa, and lja.scaffold_minced_secondaryBubble.fa were outputted? If they were not outputted or their size were incorrect, the mince process was failed.

I can not find the what caused the bug only by the log.txt. If possible, could you share me the input data?

xiekunwhy commented 2 years ago

Yes, but all are 0 size. image

By the way, the contigs I am using is generated from wtdbg2 results. And the data is too large (pacbio reads + hic > 70G), is there an easy way to share you?

ShunOuchi commented 2 years ago

If you are using public data, please tell me the accession number. Otherwise, please upload your files to a cloud storage service, and share them or email them to me (oouchi.s.aa[at]m.titech.ac.jp).

By the way, is the assembly size of the wtdbg2 contigs nearly double the estimated haploid genome size? Wtdbg2 outputs contigs that ignores differences between haplotypes, so I recommend using FALCON-Unzip or Canu -hetero option (corOutCoverage = 200 batOptions = -dg 3 -db 3 -dr 1 -ca 500 -cp 50) instead of wtdbg2. This is because GreenHill does not have the function to phase the consensus regions in input.

xiekunwhy commented 2 years ago

The datas have not been published yet.

The assembly size of wtdbg2 contigs is close to haploid genome size, not diploid genome size. So greenhill can not be used as a common hic scaffolding tool to scaffold a haploid genome?

ShunOuchi commented 2 years ago

Because GreenHill first identify contig pairs that consist of the same loci from homologous chromosomes in the mince process, it may not work for haploid genome. If you want to use GreenHill for Hi-C scaffolding of haploid genome and skip the mince process, try to run GreenHill with the following options. greenhill -c lja.fa -o lja.scaffold -t 30 -p Lja.subreads.fasta -HIC hic_1.fq.gz hic_2.fq.gz ※We have not tested this case, so we do not know if it will work.

ShunOuchi commented 2 years ago

Regarding the compile error, the required gcc version listed in the README was incorrect (Correct: gcc version >= 4.8). Sorry. I update README.

xiekunwhy commented 2 years ago

Hi,

I tried two ways, but all failed!

LongRead+HIC source /Bio/User/software/anaconda3/bin/activate greenhill;greenhill -c lja.fa -o lja.scaffold -t 35 -p Lja.subreads.fasta -HIC hic_1.fq.gz hic_2.fq.gz greenhill_tgs_hic.sh.log

HIC only source /Bio/User/software/anaconda3/bin/activate greenhill;greenhill -c lja.fa -o lja.scaffold -t 50 -HIC hic_1.fq.gz hic_2.fq.gz greenhill_hic_only.sh.log

Best, Kun

ShunOuchi commented 2 years ago

LongRead + HIC Maybe, GreenHill failed to get the readlength distribution. lja.scaffold_longReadLibrary_readDistribution.tsv was outputted?

HIC only No Hi-C reads mapped to the contigs. The wtdbg2 contigs and Hi-C reads are from same sample? The quality of the wtdbg2 contigs or Hi-C reads are too wrong?

xiekunwhy commented 2 years ago

LongRead + HIC Yes, lja.scaffold_longReadLibrary_readDistribution.tsv was not outputted, here is all file ouputed. image

HiC only, The contigs and the hic data are from the same sample, salsa(https://github.com/marbl/SALSA) and yahs(https://github.com/c-zhou/yahs) run normally, and the overall alignment rate is 95.51% when using bowtie2. And the wtdbg2 contigs are polished well (EST protein mapped busco is nearly equal to EST protein only busco). image

GreenHill stop quickly after k-mer table was made when using HiC only, I think mapping step in greenhill was not activated normally after k-mer table made.

Best, Kun

ShunOuchi commented 2 years ago

Please try running GreenHill with test data. If you get the same error, it may be due to your running environmental. If not, it may be due to the input data.

xiekunwhy commented 2 years ago

Maybe there are some problem with my device, diamond(https://github.com/bbuchfink/diamond) without --no-unlink option also stop abnormally, the temporary file system of my device is abnormal. I will try greenhill in some old devices.

github-actions[bot] commented 11 months ago

Stale issue message