SethMMorton / natsort

Simple yet flexible natural sorting in Python.
https://pypi.org/project/natsort/
MIT License
907 stars 52 forks source link

Wrong os_sorted sorting with special character in filename #145

Closed toroConverter closed 2 years ago

toroConverter commented 2 years ago

Describe the bug It can happen that the os_sorted functionality does not sort correctly files with special character inside

Expected behavior File sorting equal to Windows Explorer

Environment (please complete the following information):

To Reproduce

from natsort import os_sorted

file_list = ['Try.Me.Bug - 09 - One.Two.Three.[text].mkv',
             'Try.Me.Bug - 07 - One.Two.5.[text].mkv',
             'Try.Me.Bug - 08 - One.Two.Three[text].mkv']

file_list2 = ['TryMe - 02 - One Two Three [text].mkv',
              'TryMe_-_03_-_One_Two_Three_[text].mkv',
              'TryMe_-_01_-_One_Two_Three_[text].mkv']

for file in os_sorted(file_list):
    print(file)

for file in os_sorted(file_list2):
    print(file)

Expected sorting:

file_list Try.Me.Bug - 07 - One.Two.5.[text].mkv Try.Me.Bug - 08 - One.Two.Three[text].mkv Try.Me.Bug - 09 - One.Two.Three.[text].mkv

filelist2 TryMe_-_01_-_One_Two_Three_[text].mkv TryMe - 02 - One Two Three [text].mkv TryMe_-_03-_One_Two_Three_[text].mkv

Actual sorting

file_list Try.Me.Bug - 08 - One.Two.Three[text].mkv Try.Me.Bug - 09 - One.Two.Three.[text].mkv Try.Me.Bug - 07 - One.Two.5.[text].mkv

filelist2 TryMe - 02 - One Two Three [text].mkv TryMe_-_01_-_One_Two_Three_[text].mkv TryMe_-_03_-_One_Two_Three\[text].mkv


At the moment the only way to overcome this issue is to use os_sorted along with key=lambda x: re.sub(r'[^a-zA-Z0-9]+', ' ', x)

SethMMorton commented 2 years ago

I agree that the Try.Me.Bug - 07 - One.Two.5.[text].mkv case was a bug (at least undesired behavior) and I have a fix proposed in #146.

However, the TryMe_-_01_-_One_Two_Three_[text].mkv case is not a bug, and you would need to use something like key=lambda x: x.replace("_", " ") to sort it properly. This is because the code has no way to tell that you want to treat "_" and " " the same unless you tell it to.

toroConverter commented 2 years ago

@SethMMorton ok thanks a lot for your feedback and your work!