aidenlab / 3d-dna

3D de novo assembly (3D DNA) pipeline
MIT License
198 stars 56 forks source link

No final.hic file available #31

Open ke-shi opened 5 years ago

ke-shi commented 5 years ago

We are working on genome assembly of an interspecific hybrid plant having two genomes.

During the 3d-dna step even with haploid mode and diploid, we always had errors saying:

:( Assembly file does not match cprops file. Exiting! ...Building the hic file temp.genome.polished.split.asm_mnd.txt does not exist or does not contain any reads.

and got empty files of genome.polished.split.edits_2D.txt genome.polished.split.mismatches_2D.txt genome.polished.split.suspicious_2D.txt

The following processes were also disrupted, and no genome.final.hic (=genome.rawchrom.hic) was obtained at the final.

When we used only single genome, or haploid, for 3d-dna, all steps were completed to generate final.hic files.

We would appreciate your suggestion. Thanks!

dudcha commented 5 years ago

Please share the full command, out and err stream. Which version are you using? Are the requirements met? Everything after the first :( does not matter: those files cannot be created if polishing failed. The scenario seems unusual: I do not have other examples of problems at polishing step so need more details. What does it mean you use a single genome? How do you run Juicer for these two cases?

Best, Olga

On Nov 20, 2018, at 8:27 AM, ke-shi notifications@github.com wrote:

We are working on genome assembly of an interspecific hybrid plant having two genomes.

During the 3d-dna step even with haploid mode and diploid, we always had errors saying:

:( Assembly file does not match cprops file. Exiting! ...Building the hic file temp.genome.polished.split.asm_mnd.txt does not exist or does not contain any reads.

and got empty files of genome.polished.split.edits_2D.txt genome.polished.split.mismatches_2D.txt genome.polished.split.suspicious_2D.txt

The following processes were also disrupted, and no genome.final.hic (=genome.rawchrom.hic) was obtained at the final.

When we used only single genome, or haploid, for 3d-dna, all steps were completed to generate final.hic files.

We would appreciate your suggestion. Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

ke-shi commented 5 years ago

Thank you for reply! How can I share the files? On this thread?

The genome assembly was constructed with TrioCanu of TrioBinning approach (Koren et al. 2018). In this step, PacBio reads from a "hetero-genome" can be divided into two parents (two species in our case), then each was assembled to obtain a haplotype assembly. The total of two haplotype assemblies are of course the genome of the interspecific hybird.

When we used the two haplotype assemblies in 3d-dna (version 180419), we had the errors; when only one assembly was used even though HiC data included two genomes, no errors happened. No errors for another assembly as well. I think our pipeline are working.

dudcha commented 5 years ago

Ke-Shi,

I do not understand from your answer: please confirm that you have run Juicer separately for the hetero-genome and are not using the one for the parent. I.e. the merged_nodups file and the fasta files are corresponding to the same assembly.

I do not know the format of TrioCanu output. I would suggest checking the names of sequences: perhaps there is some clash in names between the haplotype sequences that is causing a problem.

Finally, keep in mind that because a large number of reads will end up below the mapping quality threshold in the hetero-case. If you are running 3d-dna with default parameters this may result in too many empty rows in the hic file and potentially no norm present.

Best, Olga

ke-shi commented 5 years ago

TrioCanu gives us two haplotype genome assemblies for parents. Yes, we changed sequence IDs in the two output files to avoid overlapping.

We concatenated the two haplotype sequences into one file genome.fasta as the genome of the hybrid, totally hetero-genome looks like diploid, and ran Juicer: juicer.sh -g genome -s DpnII -z genome.fasta -y genome_DpnII.txt -p chrom.sizes -D . -r

Before going 3d-dna, the resultant merged_nodups.txt was modified to remove undesired read pairs bridging two haplotype genomes. Actually, because we had a genetic map of this hybrid, read pairs connecting scaffolds belonging different linkage groups were also deleted.

We do not believe this modification kill 3d-dna because we had already tested with other haploid dataset.

Then, we ran 3d-dna and got the errors. run-asm-pipeline.sh -m diploid genome.fasta merged_nodups.selected.txt

The log file up to the first :( is here: log.txt

Thanks!

dudcha commented 5 years ago

Your problem according to the log file is missing norm due to spar city of the matrix at default resolution for misjoin correction of 25kb:

“Unable to dump java.io.IOException: Normalization missing for: assembly_assembly_BP_25000” You can run without error correction, look for misjoin at coarser resolution where you have norms or look at 25kb using innormalized matrix. Best, Olga

On Nov 27, 2018, at 12:52 AM, ke-shi notifications@github.com wrote:

TrioCanu gives us two haplotype genome assemblies for parents. Yes, we changed sequence IDs in the two output files to avoid overlapping.

We concatenated the two haplotype sequences into one file genome.fasta as the genome of the hybrid, totally hetero-genome looks like diploid, and ran Juicer: juicer.sh -g genome -s DpnII -z genome.fasta -y genome_DpnII.txt -p chrom.sizes -D . -r

Before going 3d-dna, the resultant merged_nodups.txt was modified to remove undesired read pairs bridging two haplotype genomes. Actually, because we had a genetic map of this hybrid, read pairs connecting scaffolds belonging different linkage groups were also deleted.

We do not believe this modification kill 3d-dna because we had already tested with other haploid dataset.

Then, we ran 3d-dna and got the errors. run-asm-pipeline.sh -m diploid genome.fasta merged_nodups.selected.txt

The log file up to the first :( is here: log.txt

Thanks!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

ke-shi commented 5 years ago

We tried two parameters but got another warnings/errors like: :| Warning: No input for label1 was provided. Default for label1 is ":::fragment_" :| Warning: No input for label2 was provided. Default for label2 is ":::debris" :( Some pairwise alignments are in conflict. Skipping merge block ...

--editor-coarse-resolution 1000000 --editor-coarse-region 2500000 run-asm-pipeline.rg2500000-rs1000000.log.txt

--editor-coarse-resolution 250000 --editor-coarse-region 1250000 run-asm-pipeline.rg1250000-rs250000.log.txt