WarrenLab / hic-scaffolding-nf

Nextflow pipeline for scaffolding genome assemblies with Hi-C reads
MIT License
12 stars 3 forks source link

Add instructions for how to generate curated fasta file #4

Closed mankiddyman closed 7 months ago

mankiddyman commented 8 months ago

I was able to successfully use the pipeliine to output in out/juicebox_input/out_JBAT.assembly and out_JBAT.hic files and then subsequently make the required changes in Juicebox.

Following the instructions in the Genome assembly workbook I exported my assembly from juicebox using Assembly -> Export Assembly to get a new file ending with ...review.assembly

After that the workbook reccomends using a script from the 3dDNA package but since this workflow doesn't generate merged_nodups.txt I found an alternative package by the name of juicebox_scripts using the juicebox_assembly_converter.py script giving as input the review.assembly and .fasta files I had used with hic-scaffolding-nf but I get an error saying that the contigs fail to map.

I am not asking for troubleshooting juicebox scripts but rather how users of this package are generating .fastas from their assemblies.

Help Appreciated!

mankiddyman commented 8 months ago

For those curious here is the error messasge from juicebox_scripts

image

esrice commented 8 months ago

Hi Aaryan,

Instructions for just this are in the YAHS documentation: https://github.com/c-zhou/yahs#manual-curation-with-juicebox-jbat

Once completed editing, there should be a file named something like out_JBAT.review.assembly generated by Juicebox, which can be fed into juicer post command to generate AGP and FASTA files for the final genome assembly. You also need the out_JBAT.liftover.agp coordinate file previously generated with juicer pre command.

juicer post -o out_JBAT out_JBAT.review.assembly out_JBAT.liftover.agp contigs.fa

This will end up with two files out_JBAT.FINAL.agp and out_JBAT.FINAL.fa. Together with hic-to-contigs.bin or the original BED/BAM file, you can regenerate a HiC contact map for the final assembly as described in the previous section.

But this should be in the docs for this pipeline too, so thanks for bringing it to my attention. I'll add that soon.

Ed