dogweather / naturally

Natural sort algorithm
MIT License
87 stars 20 forks source link

Unexpected results when using underscores #24

Open Kris-LIBIS opened 8 years ago

Kris-LIBIS commented 8 years ago

Hi,

I like you little gem very much. I have used is a lot, but I have one issue with it:

When the items in the array contain underscores and you are trying to sort file names, you often get unexpected results:

Naturally.sort(['abc_2.tif', 'abc_1.tif'])
 => ["abc_1.tif", "abc_2.tif"] # OK
Naturally.sort(['abc_2.tif', 'abc_1_a.tif'])
 => ["abc_2.tif", "abc_1_a.tif"] # WRONG
Naturally.sort(['abc_2a.tif', 'abc_1_a.tif'])
 => ["abc_1_a.tif", "abc_2a.tif"] # OK
Naturally.sort(['abc.2.tif', 'abc.1_a.tif'])
 => ["abc.1_a.tif", "abc.2.tif"] # OK

While I understand what's happening (underscores are removed by #normalize) the results are counter-intuitive, especially when sorting file names as in this case.

After monkey-patching:

module Naturally
  def self.normalize(complex_number)
    tokens = complex_number.to_s.gsub(/_/, '.').scan(/\p{Word}+/)
    tokens.map { |t| Segment.new(t) }
  end
end

I get the expected results:

Naturally.sort(['abc_2.tif', 'abc_1.tif'])
 => ["abc_1.tif", "abc_2.tif"] 
Naturally.sort(['abc_2.tif', 'abc_1_a.tif'])
 => ["abc_1_a.tif", "abc_2.tif"] 
Naturally.sort(['abc.2.tif', 'abc.1_a.tif'])
 => ["abc.1_a.tif", "abc.2.tif"] 
Naturally.sort(['abc_2.abc', 'abc_1_xyz.abc'])
 => ["abc_1_xyz.abc", "abc_2.abc"] 

In my current code I did not monkey-patch, but worked around it with a block:

Naturally.sort_by_block(['abc_2.tif', 'abc_1_a.tif']) {|x| x.gsub('.','.0.').gsub('_','.')}
 => ["abc_1_a.tif", "abc_2.tif"] 

Note the extra '.' to '.0.' substitution, needed to give the '.' separator a higher priority as one expects for file names:

Naturally.sort(['abc_1.zzz', 'abc_1_xyz.abc'])  {|x| x.gsub('_','.')}
 => ["abc_1_xyz.abc", "abc_1.zzz"] # WRONG
Naturally.sort_by_block(['abc_1.zzz', 'abc_1_xyz.abc']) {|x| x.gsub('.','.0.').gsub('_','.')}
 => ["abc_1.zzz", "abc_1_xyz.abc"] # OK

It would be nice to have such a 'file name friendly' natural sorting integrated in the gem.

Cheers.

dogweather commented 7 years ago

Hey there! I somehow missed seeing this issue. I agree that filename-friendliness would be a good addition, and I think your analysis is correct. I'll take a look at the code.