StevenWingett / HiCUP

Hi-C data processing pipeline
GNU Lesser General Public License v3.0
31 stars 11 forks source link

How to merge data from multiple lanes or bological replicates #74

Closed HammadFarooq closed 1 year ago

HammadFarooq commented 1 year ago

Hi,

What's the recommended way to merge data from biological replicates or from a single experiment having multiple lanes? If I place the paired files on adjacent lines, HiCUP generates one output BAM/SAM file against a file pair. What if I have a single experiment having multiple lanes, what's the recommended way to merge them ?

Thanks,

Hammad

StevenWingett commented 1 year ago

Hi Hammad,

Thanks for your message.

How you merge the data depends on what you are trying to achieve.

Technical replicates / re-resequencing the same library Method 1: combine the relevant FASTQ file¬¬s and then process with HiCUP.

Method 2: alternatively, you could process with HiCUP and then combine the resulting HiCUP BAM files. The combined file will then need de-duplicating with hicup_deduplicator.

Method 1 is simpler, but I prefer Method 2 as it allows the user to spot problems (e.g. sample swaps).

I warn against using Samtools to merge HiCUP BAM files, since the read line pairing has to be retained within the file. However, please find attached a script to automate Method 2 (you will need the HiCUP scripts in your path to run this script).

Biological replicates This is different from merging technical replicates. Your best approach will probably be to process your data with HiCUP, quantitate (e.g. bin-bin interactions) and then combine biological replicates at this level.

I hope that helps.

Best,

Steven

comb_dedup.pl.zip