Closed pnrobinson closed 5 years ago
There are three different conditions and for each condition there are two biological replicates. For each biological replicate there are one to four technical replicates. Since they have the same sample ID I assume that the technical replicates are sequencing runs of the same library.
Hi-C_untreated_rep1 | GSM2644945 | SRR5633682 |
---|---|---|
SRR5633683 | ||
Hi-C_untreated_rep2 | GSM2644946 | SRR5633684 |
SRR5633685 | ||
Hi-C_auxin-2days_rep1 | GSM2644947 | SRR5633686 |
SRR5633687 | ||
SRR5633688 | ||
SRR5633689 | ||
Hi-C_auxin-2days_rep2 | GSM2644948 | SRR5633690 |
Hi-C_washoff-2days_rep1 | GSM2644949 | SRR5633691 |
SRR5633692 | ||
Hi-C_washoff-2days_rep2 | GSM2644950 | SRR5633693 |
SRR5633694 |
If we combine the FASTQ files, we will run into memory issues with Diachromatic (at least for replicate 1 of the auxin treated samples). Therefore, I would suggest to use samtools merge
in order to merge the valid pair BAM files for the technical replicates and to apply samtools rmdup
to merged BAM files.
Biological replicates should be combined on the level of interaction files. We can use a Perl script for this. This step is potentially memory-intensive due to the large number of interactions with only one read pair. Maybe this can be overcome by sorting the concatenated interaction files.