Details of Hi-C data mapping

c-zhou / yahs

Yet another Hi-C scaffolding tool

MIT License

129 stars 18 forks source link

Details of Hi-C data mapping #43

Closed yangfangyuan0102 closed 1 year ago

yangfangyuan0102 commented 1 year ago

Hi, dear author, Are there any necessary technical details if I don't plan to follow ArimaGenomics/mapping_pipeline? That's not neat, comparing to requirements of other scaffolding programs. I hope to output a usable BAM myself using bwa and samtools. Thanks

Best

c-zhou commented 1 year ago

Hello @yangfangyuan0102,

We also tried using BWA mem with -5SP options to map R1 and R2 reads together, and then samtools fixmate to fill in mate information followed by sorting by coordinates and finally marking duplicates. The bwa mem part is quite similar to the omni-c mapping.

Best, Chenxi

c-zhou commented 1 year ago

For samtools fixmate, we used -mp options. Chenxi

yangfangyuan0102 commented 1 year ago

bwa-mem2 index $genome bwa-mem2 mem -SP5 -t $cpu $genome $read1 $read2 | samtools view -@ 5 -b - | samtools fixmate -mp -@ 5 - - | samtools sort -m 2g -@ 5 - | samtools markdup -@ 5 -r - alignment.bam

AlcaArctica commented 10 months ago

Sorry, just to help me understand: I am also using the arima mapping pipeline, but found it really slow and cumbersome (we have a 40 Gbp genome of a tree). Is the above code

bwa-mem2 index $genome
bwa-mem2 mem -SP5 -t $cpu $genome $read1 $read2 | samtools view -@ 5 -b - | samtools fixmate -mp -@ 5 - - | samtools sort -m 2g -@ 5 - | samtools markdup -@ 5 -r - alignment.bam

all that is necessary to replace the arima pipeline?

Follow up: Nope, unfortunately results seem worse than with the full arima pipeline...