aidenlab / 3d-dna

3D de novo assembly (3D DNA) pipeline
MIT License
204 stars 55 forks source link

How to run run-hic-phaser.sh #111

Closed baozg closed 3 years ago

baozg commented 3 years ago

Hi, @dudcha

New release of 3d-dna have include the phasing module, but I didn't found the detailed description for the this module. Would you mind how to use this workflow for diploidy assembly?

Zhigui Bao

dudcha commented 3 years ago

Hey Zhigui,

Thanks for your interest. Indeed this is now available. The paper is accepted and hopefully should be in print soon, I'll post a link once available here. In the meantime, in brief, the phasing module (phase/run-hic-phaser.sh --help) will take the vcf file (it can be partially phased) and the merged_nodups.txt file (make sure both are with respect to the same reference), extract reads that overlap SNPs passing filter and create chromosome-length phasing. As with the rest of 3d-dna everything's very visual, and phasing hic maps are created along the way that give you an idea of how well the phasing went. The phasing maps are loadable and can be interactively manipulated in JBAT.

Best, Olga

baozg commented 3 years ago

Thanks

chhylp123 commented 3 years ago

Hi @dudcha, is there any manual to get phased assembly with 3D-DNA? Thank you in advance.

dudcha commented 3 years ago

Hi,

There isn’t one yet, unfortunately, but I’ll work to add a chapter to Genome Assembly Cookbook as soon as possible. For now please see the supplement to Hoencamp et al. 2021: https://science.sciencemag.org/content/372/6545/984/tab-figures-data

And 3d-dna/phase/run-hic-phaser.sh -h

The latter will tell you to pass a vcf and mnd, just make sure the same ref is used for both.

Best, Olga

On May 30, 2021, at 4:10 AM, chhylp123 @.***> wrote:

 Hi @dudcha, is there any manual to get phased assembly with 3D-DNA? Thank you in advance.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

dudcha commented 3 years ago

Hi Zhigui,

This paper is now out: https://science.sciencemag.org/content/372/6545/984

Thanks again for your interest!

Olga

On May 10, 2021, at 10:14 AM, Zhigui Bao @.***> wrote:

 Thanks

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

chhylp123 commented 3 years ago

Could you please let me know what are the inputs and outputs of phasing mode? Thank you so much!

dudcha commented 3 years ago

Not sure if I have responded to this already on the forum, but duplicating here as well.

The inputs are vcf file and the merged_nodups.txt file. The first one represents the list of positions and variations. The second one contains a list of deduplicated Hi-C contacts as generated by the Juicer pipeline.

The output is another vcf file, with phasing information, and "phasing contact maps" as described in Hoencamp et al., 2021 supplement.

Best, Olga

On Jun 10, 2021, at 8:08 AM, chhylp123 @.***> wrote:

Could you please let me know what are the inputs and outputs of phasing mode? Thank you so much!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aidenlab/3d-dna/issues/111#issuecomment-858606519, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLAMG3THLOH3RSRKTQOQNDTSC2LFANCNFSM42U3MRNA.

chhylp123 commented 3 years ago

Thanks. So for diploid samples, how can I get two phased assemblies if the output file is a vcf file?

dudcha commented 3 years ago

If you mean how to get a fasta corresponding to one haplotype there are separate tools for that. Note that you are not guaranteed by any means to phase every single variant, so there is variability in what exactly you would output as a haplotype in this case. A phased vcf file is as such more more appropriate format.

-Olga

On Jul 16, 2021, at 6:36 PM, chhylp123 @.***> wrote:

 Thanks. So for diploid samples, how can I get two phased assemblies if the output file is a vcf file?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

chhylp123 commented 3 years ago

Could you please recommend some tools to generate fasta from phased vcf? It would be very helpful for me. Thank you so much!

dudcha commented 3 years ago

https://gatk.broadinstitute.org/hc/en-us/articles/360037594571-FastaAlternateReferenceMaker

On Jul 16, 2021, at 7:22 PM, chhylp123 @.***> wrote:

 Could you please recommend some tools to generate fasta from phased vcf? It would be very helpful for me. Thank you so much!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

chhylp123 commented 3 years ago

I see. Thank you so much!

chhylp123 commented 3 years ago

From the Hoencamp et al. 2021 Science paper, I found "The pipeline was used with the following setting to allow for independent alignment of paired-end Hi-C reads: --Aligner.unpaired-pen=0." If I align reads with bwa, should I directly align them in single-end mode?

dudcha commented 3 years ago

What are you trying to do? Generate merged_nodups or call snps?

On Jul 17, 2021, at 4:35 PM, chhylp123 @.***> wrote:

 From the Hoencamp et al. 2021 Science paper, I found "The pipeline was used with the following setting to allow for independent alignment of paired-end Hi-C reads: --Aligner.unpaired-pen=0." If I align with bwa, should I directly align reads in single-end mode?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

chhylp123 commented 3 years ago

I guess the pipeline should be: 1) get primary assembly -> 2) call SNP by Hi-C alignment to primary assembly -> 3) phasing SNPs -> 4) generate phased VCF -> 5) output phased assemblies. How do I call SNPs in step 2)? Should I align Hi-C reads in single-end mode?

dudcha commented 3 years ago

You can use the recommendation cited in Hoencamp et al 2021 if you have access to FPGA. If not you can use GATK. For Hi-C alignment you can use alignments generated by Juicer. Juicer2 if particularly well suited for this as it produces dedupped bam in addition to merged_nodups. -Olga

On Jul 17, 2021, at 4:50 PM, chhylp123 @.***> wrote:

 I guess the pipeline should be: 1) get primary assembly -> 2) call SNP by Hi-C alignment to primary assembly -> 3) phasing SNPs -> 4) generate phased VCF -> 5) output phased assemblies. How do I call SNPs in step 2)? Should I align Hi-C reads in single-end mode?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

chhylp123 commented 3 years ago

Thanks a lot. Since I'm not familiar with Juicer, I'm still running Juicer 1.6 instead of Juicer 2. In this case, which alignment file should I use for GATK? Should I use 'topDir/splits/FASTQ_NAME.sam'? Should I also use '*abnorm.sam'?

dudcha commented 3 years ago

These are not deduplicated, you would have to dedup separately. Just use Juicer2, should save you some trouble.

On Jul 17, 2021, at 6:06 PM, chhylp123 @.***> wrote:

 Thanks a lot. Since I'm not familiar with Juicer, I'm still running Juicer 1.6 instead of Juicer 2. In this case, which alignment file should I use for GATK? Should I use 'topDir/splits/splits/FASTQ_NAME.sam'?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

chhylp123 commented 3 years ago

Sorry may I ask where is Juicer2? I cannot find it on github release page.

dudcha commented 3 years ago

I believe it’s encode branch on the juicer GitHub. This discussion is not very fitting for issues. If you don’t mind, it would be best if this could be moved for the forum https://groups.google.com/g/3d-genomics

Thanks! Olga

On Jul 17, 2021, at 6:37 PM, chhylp123 @.***> wrote:

 Sorry may I ask where is Juicer2? I cannot find it on github release page.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

chhylp123 commented 3 years ago

Thanks a lot!