GuipengLi / ChIA-PET2

a versatile and flexible pipeline for analysing different variants of ChIA-PET data
GNU General Public License v3.0
34 stars 19 forks source link

Read names of PET ends do not match #9

Closed MERobinson closed 6 years ago

MERobinson commented 6 years ago

I tried re-analysing the publicly available encode ChIA-PET data (GSE59395), however I run into the following error at step 3:

Error: read names of PET ends do not match

I checked the read names of the input SAMs and they do appear to match as far as I can tell, excerpt below:

tail -n 2 chiapet/GSM1436262_H3K27ac_ChIAPET_K562_1.valid.sam 
SRR1514662.177322795.1  4   *   0   0   *   *   0   0   ACTCCTGATGCGAGTAAT  ??@7AD?DHHDF7EF@DD  AS:i:0  XS:i:0
SRR1514662.177322797.1  4   *   0   0   *   *   0   0   AGAGCTACTACCTCTGAGG @B@FFFFFHGHFFHGIEGG AS:i:0  XS:i:0
tail -n 2 chiapet/GSM1436262_H3K27ac_ChIAPET_K562_2.valid.sam 
SRR1514662.177322795.2  4   *   0   0   *   *   0   0   ATGTAAATAGCTACAAGGA B<+4AAAB?BBBFG@@<?3 AS:i:0  XS:i:0
SRR1514662.177322797.2  4   *   0   0   *   *   0   0   AAGATGCCCATAAAGGGA  C@CFFDFFHHHHHJJJJJ  AS:i:0  XS:i:0

Any idea what might be causing this?

(ChIA-PET2 version 0.9.2)

Thanks, Mark

GuipengLi commented 6 years ago

Hi Mark, ChIA-PET2 requires the read names of paired-end reads the same. From the data you shown, part of the reads have different read names, e.g. SRR1514662.177322795.1 vs SRR1514662.177322795.2. I would suggest to first fix the read names problem ( trim the suffix ".1" and ".2") and then rerun ChIA-PET2.

MERobinson commented 6 years ago

Yes, sorry, realised that after posting, just expected it to handle suffixes. Trimming worked! Thanks