Closed xiekunwhy closed 11 months ago
The compiled version Segmentation fault too. So I think there is a bug in greenhill when giving Mixed-haplotype style input.
lja.scaffold_minced_nonBubble.fa, lja.scaffold_minced_primaryBubble.fa, and lja.scaffold_minced_secondaryBubble.fa were outputted? If they were not outputted or their size were incorrect, the mince process was failed.
I can not find the what caused the bug only by the log.txt. If possible, could you share me the input data?
Yes, but all are 0 size.
By the way, the contigs I am using is generated from wtdbg2 results. And the data is too large (pacbio reads + hic > 70G), is there an easy way to share you?
If you are using public data, please tell me the accession number. Otherwise, please upload your files to a cloud storage service, and share them or email them to me (oouchi.s.aa[at]m.titech.ac.jp).
By the way, is the assembly size of the wtdbg2 contigs nearly double the estimated haploid genome size? Wtdbg2 outputs contigs that ignores differences between haplotypes, so I recommend using FALCON-Unzip or Canu -hetero option (corOutCoverage = 200 batOptions = -dg 3 -db 3 -dr 1 -ca 500 -cp 50) instead of wtdbg2. This is because GreenHill does not have the function to phase the consensus regions in input.
The datas have not been published yet.
The assembly size of wtdbg2 contigs is close to haploid genome size, not diploid genome size. So greenhill can not be used as a common hic scaffolding tool to scaffold a haploid genome?
Because GreenHill first identify contig pairs that consist of the same loci from homologous chromosomes in the mince process, it may not work for haploid genome. If you want to use GreenHill for Hi-C scaffolding of haploid genome and skip the mince process, try to run GreenHill with the following options.
greenhill -c lja.fa -o lja.scaffold -t 30 -p Lja.subreads.fasta -HIC hic_1.fq.gz hic_2.fq.gz
※We have not tested this case, so we do not know if it will work.
Regarding the compile error, the required gcc version listed in the README was incorrect (Correct: gcc version >= 4.8). Sorry. I update README.
Hi,
I tried two ways, but all failed!
LongRead+HIC source /Bio/User/software/anaconda3/bin/activate greenhill;greenhill -c lja.fa -o lja.scaffold -t 35 -p Lja.subreads.fasta -HIC hic_1.fq.gz hic_2.fq.gz greenhill_tgs_hic.sh.log
HIC only source /Bio/User/software/anaconda3/bin/activate greenhill;greenhill -c lja.fa -o lja.scaffold -t 50 -HIC hic_1.fq.gz hic_2.fq.gz greenhill_hic_only.sh.log
Best, Kun
LongRead + HIC Maybe, GreenHill failed to get the readlength distribution. lja.scaffold_longReadLibrary_readDistribution.tsv was outputted?
HIC only No Hi-C reads mapped to the contigs. The wtdbg2 contigs and Hi-C reads are from same sample? The quality of the wtdbg2 contigs or Hi-C reads are too wrong?
LongRead + HIC
Yes, lja.scaffold_longReadLibrary_readDistribution.tsv was not outputted, here is all file ouputed.
HiC only,
The contigs and the hic data are from the same sample, salsa(https://github.com/marbl/SALSA) and yahs(https://github.com/c-zhou/yahs) run normally, and the overall alignment rate is 95.51% when using bowtie2. And the wtdbg2 contigs are polished well (EST protein mapped busco is nearly equal to EST protein only busco).
GreenHill stop quickly after k-mer table was made when using HiC only, I think mapping step in greenhill was not activated normally after k-mer table made.
Best, Kun
Please try running GreenHill with test data. If you get the same error, it may be due to your running environmental. If not, it may be due to the input data.
Maybe there are some problem with my device, diamond(https://github.com/bbuchfink/diamond) without --no-unlink option also stop abnormally, the temporary file system of my device is abnormal. I will try greenhill in some old devices.
Stale issue message
Hi,
I got some errors when compiling the GreenHill source code. My g++/gcc version is: gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC)
Here are the errors I got, do you know what's wrong? compile.log.txt
And I want to use conda version instead, but I always Segmentation fault. Here are the conda version log, I don't know what's wrong. source /Bio/User/software/anaconda3/bin/activate greenhill;greenhill -cph lja.fa -o lja.scaffold -t 30 -p Lja.subreads.fasta -HIC hic_1.fq.gz hic_2.fq.gz greenhill.sh.log.txt
Best, Kun