As you can see , even though my input is sorted alphabetically on the 2 column and numerically on the 3rd column, the "distinct" operation does not retain the ordering. and forced alphabetical ordering on the output. would you consider implementing a sorted_num_distinct operation?
Or at least retain the input order of the numbers ?
Hi Guys
I was using groupBy and noticed something a bit annoying and I was wondering if you could improve it.
assume I have a file with this content:
MPL NM_005373.2 1 MPL NM_005373.2 2 MPL NM_005373.2 3 MPL NM_005373.2 4 MPL NM_005373.2 5 MPL NM_005373.2 6 MPL NM_005373.2 7 MPL NM_005373.2 8 MPL NM_005373.2 9 MPL NM_005373.2 10 MPL NM_005373.2 11 MPL NM_005373.2 12 MPL XM_005270874.1 1 MPL XM_005270874.1 2 MPL XM_005270874.1 3 MPL XM_005270874.1 4 MPL XM_005270874.1 5 MPL XM_005270874.1 6 MPL XM_005270874.1 7 MPL XM_005270874.1 8 MPL XM_005270874.1 9 MPL XM_005270874.1 10 MPL XM_005270874.1 11 MPL XM_005270874.1 12
the operation
groupBy -g 1 -c 2,3 -o distinct,distinct
Outputs: MPL NM_005373.2,XM_005270874.1 1,10,11,12,2,3,4,5,6,7,8,9
As you can see , even though my input is sorted alphabetically on the 2 column and numerically on the 3rd column, the "distinct" operation does not retain the ordering. and forced alphabetical ordering on the output. would you consider implementing a sorted_num_distinct operation?
Or at least retain the input order of the numbers ?
Thanks
Duarte