aperezdc / ngx-fancyindex

Fancy indexes module for the Nginx web server
Other
849 stars 128 forks source link

Directories listing returns results in a nontraditional Alphabetical order - ALL CAPS precede mixed case or lower case #121

Open SnortsAlot opened 3 years ago

SnortsAlot commented 3 years ago

Returning the results from directory listing.

I'm guessing the ascii values are being checked for that so while a listing in more traditional alphabetical ordering would look like

Fan FIRST fumble Fuzzy

the index returns

FIRST Fan Fuzzy fumble

apparently prioritizing uppercase in resulting listings.

if this is intended, please turn this into a feature request to allow returned results with effectively a .lower() scenario and... then how other language's special characters are handled in that ordering,

I've not done extensive testing, but it appears non native "English" characters are displayed/ordered after z in most cases.

ie.. a,b,c... z Then all other characters or diacritical markings (umlaut, cedilla, accute accent, crucflex, tilde, grave, etc)

à, è, ì, ò, ù - À, È, Ì, Ò, Ù á, é, í, ó, ú, ý - Á, É, Í, Ó, Ú, Ý ą,ł, ż, ß, ä, ö, ü, ç, ã, õ,

armadillo monkey zebra àlex

versus the more expected

àlex armadillo monkey zebra

ryandesign commented 2 years ago

Right, it sorts names using ngx_strcmp (a wrapper around the standard strcmp), just like nginx's built-in autoindex module does. It's fairly common for web servers (and UNIX systems generally) to sort directory indexes in this manner.

To perform a case-insensitive sort like you suggest, it would have to use strcasecmp. Sorting which is aware of non-ASCII characters is more complex. Sorting rules (collations) can even change depending on the locale. That would mean that a locale-aware sort function like strcoll would need to be used, and it would need to be possible to configure the server as to what locale to use.

I don't see any wrappers around strcasecmp or strcoll in nginx's utility API documentation. Perhaps you could suggest to the developers of nginx that they add the ability to do case-insensitive sorting and/or locale-aware sorting to their autoindex module. Perhaps in the process of adding that feature, they will add ngx_strcasecmp and ngx_strcoll functions which ngx_fancyindex could then use to implement the same feature.

ryandesign commented 2 years ago

Locale-aware sorting has been mentioned previously in #60.

aperezdc commented 2 years ago

One potential can of worms of using locale-aware collation is that we don't know which locale should be used:

Using plain ASCII sorting (what strcmp and ngx_strcmp do) is the only reasonable option. I think we can consider adding case-insensitive sorting, though. Maybe even switching to case-insensitive sorting by default 🤔

ryandesign commented 2 years ago

I could see a use case where someone runs a server serving files whose names are primarily in one language and wanting the sort order to reflect that language. Think about an internal server at a small company serving files only for the employees of that company.

Allowing the web site visitor to influence the locale of the sort order is probably beyond the scope of what a web server module could be expected to do. Allowing the returned content to vary based on a header is bad for caching too.

Allowing the server administrator to select case-insensitive sorting (#78, #124) is great, but again it's probably out of scope to allow the web site visitor to select that, and to remain consistent with what Apache and nginx server administrators expect I would recommend keeping case-sensitive as the default.