Closed baozg closed 3 years ago
Hey Zhigui,
Thanks for your interest. Indeed this is now available. The paper is accepted and hopefully should be in print soon, I'll post a link once available here. In the meantime, in brief, the phasing module (phase/run-hic-phaser.sh --help) will take the vcf file (it can be partially phased) and the merged_nodups.txt file (make sure both are with respect to the same reference), extract reads that overlap SNPs passing filter and create chromosome-length phasing. As with the rest of 3d-dna everything's very visual, and phasing hic maps are created along the way that give you an idea of how well the phasing went. The phasing maps are loadable and can be interactively manipulated in JBAT.
Best, Olga
Thanks
Hi @dudcha, is there any manual to get phased assembly with 3D-DNA? Thank you in advance.
Hi,
There isn’t one yet, unfortunately, but I’ll work to add a chapter to Genome Assembly Cookbook as soon as possible. For now please see the supplement to Hoencamp et al. 2021: https://science.sciencemag.org/content/372/6545/984/tab-figures-data
And 3d-dna/phase/run-hic-phaser.sh -h
The latter will tell you to pass a vcf and mnd, just make sure the same ref is used for both.
Best, Olga
On May 30, 2021, at 4:10 AM, chhylp123 @.***> wrote:
Hi @dudcha, is there any manual to get phased assembly with 3D-DNA? Thank you in advance.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Hi Zhigui,
This paper is now out: https://science.sciencemag.org/content/372/6545/984
Thanks again for your interest!
Olga
On May 10, 2021, at 10:14 AM, Zhigui Bao @.***> wrote:
Thanks
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Could you please let me know what are the inputs and outputs of phasing mode? Thank you so much!
Not sure if I have responded to this already on the forum, but duplicating here as well.
The inputs are vcf file and the merged_nodups.txt file. The first one represents the list of positions and variations. The second one contains a list of deduplicated Hi-C contacts as generated by the Juicer pipeline.
The output is another vcf file, with phasing information, and "phasing contact maps" as described in Hoencamp et al., 2021 supplement.
Best, Olga
On Jun 10, 2021, at 8:08 AM, chhylp123 @.***> wrote:
Could you please let me know what are the inputs and outputs of phasing mode? Thank you so much!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aidenlab/3d-dna/issues/111#issuecomment-858606519, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLAMG3THLOH3RSRKTQOQNDTSC2LFANCNFSM42U3MRNA.
Thanks. So for diploid samples, how can I get two phased assemblies if the output file is a vcf file?
If you mean how to get a fasta corresponding to one haplotype there are separate tools for that. Note that you are not guaranteed by any means to phase every single variant, so there is variability in what exactly you would output as a haplotype in this case. A phased vcf file is as such more more appropriate format.
-Olga
On Jul 16, 2021, at 6:36 PM, chhylp123 @.***> wrote:
Thanks. So for diploid samples, how can I get two phased assemblies if the output file is a vcf file?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Could you please recommend some tools to generate fasta from phased vcf? It would be very helpful for me. Thank you so much!
https://gatk.broadinstitute.org/hc/en-us/articles/360037594571-FastaAlternateReferenceMaker
On Jul 16, 2021, at 7:22 PM, chhylp123 @.***> wrote:
Could you please recommend some tools to generate fasta from phased vcf? It would be very helpful for me. Thank you so much!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
I see. Thank you so much!
From the Hoencamp et al. 2021 Science paper, I found "The pipeline was used with the following setting to allow for independent alignment of paired-end Hi-C reads: --Aligner.unpaired-pen=0." If I align reads with bwa, should I directly align them in single-end mode?
What are you trying to do? Generate merged_nodups or call snps?
On Jul 17, 2021, at 4:35 PM, chhylp123 @.***> wrote:
From the Hoencamp et al. 2021 Science paper, I found "The pipeline was used with the following setting to allow for independent alignment of paired-end Hi-C reads: --Aligner.unpaired-pen=0." If I align with bwa, should I directly align reads in single-end mode?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
I guess the pipeline should be: 1) get primary assembly -> 2) call SNP by Hi-C alignment to primary assembly -> 3) phasing SNPs -> 4) generate phased VCF -> 5) output phased assemblies. How do I call SNPs in step 2)? Should I align Hi-C reads in single-end mode?
You can use the recommendation cited in Hoencamp et al 2021 if you have access to FPGA. If not you can use GATK. For Hi-C alignment you can use alignments generated by Juicer. Juicer2 if particularly well suited for this as it produces dedupped bam in addition to merged_nodups. -Olga
On Jul 17, 2021, at 4:50 PM, chhylp123 @.***> wrote:
I guess the pipeline should be: 1) get primary assembly -> 2) call SNP by Hi-C alignment to primary assembly -> 3) phasing SNPs -> 4) generate phased VCF -> 5) output phased assemblies. How do I call SNPs in step 2)? Should I align Hi-C reads in single-end mode?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Thanks a lot. Since I'm not familiar with Juicer, I'm still running Juicer 1.6 instead of Juicer 2. In this case, which alignment file should I use for GATK? Should I use 'topDir/splits/FASTQ_NAME.sam'? Should I also use '*abnorm.sam'?
These are not deduplicated, you would have to dedup separately. Just use Juicer2, should save you some trouble.
On Jul 17, 2021, at 6:06 PM, chhylp123 @.***> wrote:
Thanks a lot. Since I'm not familiar with Juicer, I'm still running Juicer 1.6 instead of Juicer 2. In this case, which alignment file should I use for GATK? Should I use 'topDir/splits/splits/FASTQ_NAME.sam'?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Sorry may I ask where is Juicer2? I cannot find it on github release page.
I believe it’s encode branch on the juicer GitHub. This discussion is not very fitting for issues. If you don’t mind, it would be best if this could be moved for the forum https://groups.google.com/g/3d-genomics
Thanks! Olga
On Jul 17, 2021, at 6:37 PM, chhylp123 @.***> wrote:
Sorry may I ask where is Juicer2? I cannot find it on github release page.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Thanks a lot!
Hi, @dudcha
New release of 3d-dna have include the phasing module, but I didn't found the detailed description for the this module. Would you mind how to use this workflow for diploidy assembly?
Zhigui Bao