Closed jjuhyunkim closed 6 months ago
To elaborate further on my question, I aim to generate a plot similar to Extended Data Fig.4f, comparing the methylation and contact map between chrXi and chrXa. I have the following data available:
Could you please provide a more detailed techinical description of the "XCI analysis" part in method from your paper, particularly regarding the phrase "Using the phasing approach, concatemers were assigned to their constituent haplotypes. Hiigh-order contats were melted into pairwise contatcts using GxG."?
Thanks!
Hi Juhyun,
You can try to add haplotag information to the mock paired end bam yourself, using whatshap.
whatshap haplotag -o {output} {input.vcf} ./paired_end/{{ alias }}.cs.bam --ignore-read-groups -r {input.refgenome}
After running this, you should have a mock paired end bam with HP tags, which you can use to split the bam into a separate file for each haplotype.
If whatshap still requires unique identifiers, you can use something like the two scripts here to create and then delete unique tags for the ids.
For the paper, after we split the equivalent of ./bams/{{ alias }}.cs.bam into haplotype1.bam and haplotype2.bam, we used those to create pairs, .cooler, and other files for downstream analysis. Extended Data Fig. 4f was probably plotted in python using .cooler files. I don't think haplotyped pairs/cooler are implemented in the workflow yet, but you can see how to go from bam -> pairs -> cooler -> etc yourself by looking here.
When we wrote the paper, methylation information was not yet encoded in bam files; we used f5c/nanopolish call-methylation
and meth-freq
on the equivalent of ./bams/{{ alias }}.cs.bam to get a table of the methylation frequency at each coordinate. If you have already called methylation during basecalling, it may be easier to use modkit to extract the methylation information.
Please let me know if you have any other questions.
Closing through lack of response but feel free to re-open if you require continuing this discussion.
Sorry for the delayed response. I followed your suggestion, but after running pairtools parse2
, I ended up with all NN
(null-multi) tags. Here’s what I did:
whatshap haplotag -o $prefix \
$vcf outputs/bams/null.cs.bam \
--ignore-read-groups --output-haplotag-list haplotypes.tsv \
-r $ref
whatshap split --output-h1 h1.bam --output-h2 h2.bam $prefix haplotypes.tsv
prefix=h1
ref_fai=${ref}.fai
fragment='output/pairs/fragments.bed'
args="--drop-sam --drop-seq --expand --add-pair-index --add-columns mapq,pos5,pos3,cigar,read_len,matched_bp,algn_ref_span,algn_read_span,dist_to_5,dist_to_3,mismatches"
samtools sort -n ${prefix}.bam > ${prefix}.ns.bam
pairtools parse2 \
--output-stats "${prefix}.stats.txt" \
-c $ref_fai --single-end --readid-transform 'readID.split(":")[0]' \
$args ${prefix}.ns.bam > ${prefix}.extract_pairs.tmp
I successfully obtained haplotype-specific BAM files from outputs/bams/null.cs.bam
.
(poreC) [kimj75@cn4270 output]$ samtools flagstats -@ 50 h1.ns.bam
20861000 + 0 in total (QC-passed reads + QC-failed reads)
20861000 + 0 primary
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
20861000 + 0 mapped (100.00% : N/A)
20861000 + 0 primary mapped (100.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
And the stats from pairtools parse2 looks like this:
total 20861000
total_unmapped 20861000
total_single_sided_mapped 0
total_mapped 0
total_dups 0
total_nodups 0
cis 0
trans 0
pair_types/NN 20861000
Could you please give me some advice?
The newest version of pairtools (1.1.0) gives me a similar result. Can you downgrade to the same version used in pore-c-snakemake and try again?
For example, you could install it using conda with
conda create --name pairtools_1.0.2 -c conda-forge -c bioconda pairtools=1.0.2 numpy=1.20 python=3.8
It works now! Thank you so much!!!
Hi developers,
Thank you for developing this wonderful algorithm.
I have a question regarding how to create haplotype-specific Hi-C links (or any relevant format). I am interested in calling haplotype specific methylation levels, as described in your paper, and haplotype-specific Hi-C link.
I believe that the
./bams/{{ alias }}.cs.bam
file contains HP tags in the BAM format, but this information is not present in the./paired_end/{{ alias }}.ns.bam
file. I'm concerned that./bams/{{ alias }}.cs.bam
might differ from other normal Hi-C data, so I'm hesitant to use the poreC data with traditional tools.I hope my question isn't too trivial.
Thanks.