TheJacksonLaboratory / diachromatic

Diachromatic is a Java application for preprocessing and quality control of Hi-C and CHi-C data.
https://diachromatic.readthedocs.io/en/latest/
GNU General Public License v3.0
3 stars 1 forks source link

Show simple/twisted statistics as box plots (CTCF depeletion datasets) #89

Closed pnrobinson closed 5 years ago

pnrobinson commented 5 years ago
hansenp commented 5 years ago

There are three different conditions and for each condition there are two biological replicates. For each biological replicate there are one to four technical replicates. Since they have the same sample ID I assume that the technical replicates are sequencing runs of the same library.

Hi-C_untreated_rep1 GSM2644945 SRR5633682
    SRR5633683
     
Hi-C_untreated_rep2 GSM2644946 SRR5633684
    SRR5633685
     
Hi-C_auxin-2days_rep1 GSM2644947 SRR5633686
    SRR5633687
    SRR5633688
    SRR5633689
     
Hi-C_auxin-2days_rep2 GSM2644948 SRR5633690
     
Hi-C_washoff-2days_rep1 GSM2644949 SRR5633691
    SRR5633692
     
Hi-C_washoff-2days_rep2 GSM2644950 SRR5633693
    SRR5633694

If we combine the FASTQ files, we will run into memory issues with Diachromatic (at least for replicate 1 of the auxin treated samples). Therefore, I would suggest to use samtools merge in order to merge the valid pair BAM files for the technical replicates and to apply samtools rmdup to merged BAM files.

Biological replicates should be combined on the level of interaction files. We can use a Perl script for this. This step is potentially memory-intensive due to the large number of interactions with only one read pair. Maybe this can be overcome by sorting the concatenated interaction files.