arq5x / bedtools

A powerful toolset for genome arithmetic.
http://code.google.com/p/bedtools/
GNU General Public License v2.0
140 stars 85 forks source link

enhancement: allow flag to report multiple overlaps on same line in closestBed #4

Closed brentp closed 13 years ago

brentp commented 13 years ago

when annotating large regions, it's common to have multiple genes with 0 distance. those get created as multiple lines in the output. it would be nice to be able to get all of them on the same line with a boolean flag like --collapse.

then instead of:

chr10 62635514 62636138 chr10:62633793-62636189 2.932e-07 18 chr10 62634711 62634800 uc009xpd.2 0 - 714 chr10 62635514 62636138 chr10:62633793-62636189 2.932e-07 18 chr10 62634711 62634800 uc001jli.2 0 - 714 chr10 62635514 62636138 chr10:62633793-62636189 2.932e-07 18 chr10 62634711 62634800 uc001jlh.2 0 - 714

it would give:

chr10 62635514 62636138 chr10:62633793-62636189 2.932e-07 18 chr10 62634711 62634800 uc009xpd.2,uc001jli.2,uc001jlh.2

brentp commented 13 years ago

closing as this is addressed by groupBy in filo: http://obx.cphg.virginia.edu/quinlan/?p=208

aaron, in your post, then 2nd groupBy command is the same as the first, i think it should end in -concat, no?

arq5x commented 13 years ago

riiiight. sorry!