arq5x / bedtools

A powerful toolset for genome arithmetic.
http://code.google.com/p/bedtools/
GNU General Public License v2.0
139 stars 86 forks source link

Merge delimiter for input file to define how to number columns #165

Open nroak opened 2 years ago

nroak commented 2 years ago

I'm interested in defining column separator for -c option in bedtools merge. I have a bedj file where the 4th column of input file has characters that are automatically detected as column separators. I want to ignore those and use TAB as a column separator instead. I have given an example file below and the output I get vs output I expect.

#Input FIle (generated from bedtools intersect)
chr1    4510001 4769999 {"color":"rgba(128,0,128,1.00)","exon":[[4510001,4519999],[4760001,4769999]],"name":"_n19_qBL1.7390461E-4"} .   -1  -1      .   0
chr1    4850001 5099999 {"color":"rgba(128,0,128,1.00)","exon":[[4850001,4874999],[5075001,5099999]],"name":"_n59_qBL1.1236174E-9"} chr1    4857814 4897909 Tcea1   40095
chr1    4850001 5099999 {"color":"rgba(128,0,128,1.00)","exon":[[4850001,4874999],[5075001,5099999]],"name":"_n59_qBL1.1236174E-9"} chr1    5070018 5162529 Atp6v1h 29981
chr1    4850001 5099999 {"color":"rgba(128,0,128,1.00)","exon":[[4850001,4874999],[5075001,5099999]],"name":"_n59_qBL1.1236174E-9"} chr1    4909576 5070285 Rgs20   160709
# Expected output with bedtools merge -c 8,9 -o collapse,collapse
chr1    4510001 5099999 Tcea1,Atp6v1h,Rgs20 40095,29981,160709
# Actual Output
chr1    4850001 5099999 {"color":"rgba(128  0   128 5099999]],5099999]],5099999]]   "name":"_n59_qBL1.1236174E-9"}  chr1    4857814 4897909 Tcea1   40095,"name":"_n59_qBL1.1236174E-9"}    chr1    5070018 5162529 Atp6v1h 29981,"name":"_n59_qBL1.1236174E-9"}    chr1    4909576 5070285 Rgs20   160709

As you can see merge used comma as a delimiter instead of a TAB. A way to define this input delimiter would be extremely useful.