aidenlab / 3d-dna

3D de novo assembly (3D DNA) pipeline
MIT License
207 stars 55 forks source link

current 3D-DNA version #170

Closed dustin-cram closed 1 year ago

dustin-cram commented 1 year ago

Hi,

I want to confirm that I am using the latest version of 3D-DNA because there seem to be three different versions reported in different places. I downloaded from the release link on the github page titled "3D-DNA Phasing branch 201008" (https://github.com/aidenlab/3d-dna/releases/tag/201008).

But if I run 'run-asm-pipeline.sh --help' the first line of output gives a different version: 'version: 190716' and just below that there is yet another version: 3D de novo assembly: version 170123

I am concerned because the options available in the version I downloaded don't match the documentation in the DNAZoo manual or many of these posts. For example the DNAZoo manual says there should be an option '-m, --merge', but in my version '-m' is mode and must be diploid or haploid. And there are several other issues and forum posts that refer to a '--build-gapped-map' option which is required to output the 'FINAL' files. But if I use that option with my version, I get a warning message: ":| WARNING: Unknown option. Ignoring: --build-gapped-map" and no FINAL output files are produced.

Can you please let me know which version I should be using, and how I can convince it to produce the 'FINAL' files?

Regards,

Dustin Cram I

dudcha commented 1 year ago

Hey Dustin,

Valid question. The version you are using is the latest one, associated with Hoencamp et al., 2021 Science paper and what we run internally at the DNA Zoo. Indeed the versioning is confusing. I apologize for that: since the edits to this release mostly concerned the phasing part, the assembly script per se has not changed since July 2019 (hence the 190716). Basically different pieces of the script have their own version tracking and I should clean this up. The DNA Zoo Assembly cookbook has been written in 2018 and is at this point somewhat outdated (e.g. no phasing at all in the cookbook). Because of the cookbook many still use the older version associated with the main branch. The core assembly functionality has not changed except for we removed the diploid workflow, but there are a few convenience things that make the newer version potentially worthwhile even without phasing. E.g. the whole build-gapped-map part is gone, substituted for building a proper sandboxed map for a given subset of scaffolds. The FINAL suffix is now a HiC suffix. Once you run the default, you should get a .rawchrom.hic and .rawchrom.assembly - the files that you typically want to review in JBAT. After you are done with the manual review, you run the run-asm-pipeline.sh -r to generate a _HiC.fasta (and optionally a sandboxed _HiC.hic map if you pass a -c flag).

Thanks, Olga

dustin-cram commented 1 year ago

Excellent, thanks so much for the quick response.

It sounds like the output I have is what I should have and I can move forward with JBAT.