Show simple/twisted statistics as box plots (CTCF depeletion datasets)

There are three different conditions and for each condition there are two biological replicates. For each biological replicate there are one to four technical replicates. Since they have the same sample ID I assume that the technical replicates are sequencing runs of the same library.

Hi-C_untreated_rep1	GSM2644945	SRR5633682
		SRR5633683

Hi-C_untreated_rep2	GSM2644946	SRR5633684
		SRR5633685

Hi-C_auxin-2days_rep1	GSM2644947	SRR5633686
		SRR5633687
		SRR5633688
		SRR5633689

Hi-C_auxin-2days_rep2	GSM2644948	SRR5633690

Hi-C_washoff-2days_rep1	GSM2644949	SRR5633691
		SRR5633692

Hi-C_washoff-2days_rep2	GSM2644950	SRR5633693
		SRR5633694

If we combine the FASTQ files, we will run into memory issues with Diachromatic (at least for replicate 1 of the auxin treated samples). Therefore, I would suggest to use samtools merge in order to merge the valid pair BAM files for the technical replicates and to apply samtools rmdup to merged BAM files.

Biological replicates should be combined on the level of interaction files. We can use a Perl script for this. This step is potentially memory-intensive due to the large number of interactions with only one read pair. Maybe this can be overcome by sorting the concatenated interaction files.

TheJacksonLaboratory / diachromatic

Show simple/twisted statistics as box plots (CTCF depeletion datasets) #89