Open lucatrv opened 1 year ago
Hmmm right, I forgot that std doesn't have a case folding API.
I'm unfortunately inclined to say no to this and instead add some docs that shows how to do it with sort_by
and something like the caseless
crate.
How about renaming the method sort_by_file_name_lowercase
to make it clear what it's doing? Or is there a better way to deal with this?
I'm not a fan of that. It makes it too easy to do the wrong thing. If we're going to have a case insensitive API then we either do it right (which means either implementing case folding ourselves or introducing a dependency) or we don't do it at all and point folks in the right direction with docs. I'm not inclined to maintain the former at this time, so the latter is the path forward here.
IMHO since file names are typically at least alphanumeric (as opposite to numeric only, if not even Unicode), also the current sort_by_file_name
method makes already easy to do the wrong thing. In my opinion to make this matter clear to the user, the best would be to add the sort_by_file_name_lowercase
method, clarifying in its documentation what it does and relative limitations, and how to reach full case folding when needed. Otherwise the two options (just lowercase or full case folding) could be described in the documentation of the sort_by_file_name
method (better with examples).
After reading the following interesting references:
https://stackoverflow.com/questions/40250988/how-can-i-case-fold-a-string-in-rust/75526819 https://stackoverflow.com/questions/45745661/lower-vs-casefold-in-string-matching-and-converting-to-lowercase/74702121#74702121 https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf https://www.w3.org/TR/charmod-norm/#definitionCaseFolding https://sts10.github.io/2023/01/29/sorting-words-alphabetically-rust.html https://icu4x.unicode.org/doc/icu_collator/index.html https://unicode-org.github.io/icu/userguide/collation/ https://www.unicode.org/reports/tr35/tr35-collation.html
now I agree with your point. So please let me know if you prefer to add the sort_by_file_name_lowercase
method and describe in its documentation when it falls short, including a better example using icu_collator
, or otherwise if you want all this described directly in the sort_by_file_name
method's documentation.
Here is the example that I wanted to provide, using icu_collator
:
use icu_collator::{Collator, CollatorOptions, Strength};
fn main() {
let mut options = CollatorOptions::new();
options.strength = Some(Strength::Primary);
let collator: Collator =
Collator::try_new_unstable(&icu_testdata::unstable(), &Default::default(), options)
.unwrap();
for entry in WalkDir::new("test").sort_by(|a, b| {
collator.compare(
&a.file_name().to_string_lossy(),
&b.file_name().to_string_lossy(),
)
}) {
println!("{}", entry.unwrap().into_path().display());
}
}
Unfortunately it does not compile with error:
error[E0277]: `Rc<Box<[u8]>>` cannot be shared between threads safely
--> src\main.rs:64:47
|
64 | for entry in WalkDir::new("test").sort_by(|a, b| {
| _______________________________________-------_^
| | |
| | required by a bound introduced by this call
65 | | collator.compare(
66 | | &a.file_name().to_string_lossy(),
67 | | &b.file_name().to_string_lossy(),
68 | | )
69 | | }) {
| |_____^ `Rc<Box<[u8]>>` cannot be shared between threads safely
|
= help: within `Collator`, the trait `Sync` is not implemented for `Rc<Box<[u8]>>`
= note: required because it appears within the type `Cart`
= note: required because it appears within the type `Option<Cart>`
= note: required because it appears within the type `Yoke<CollationDataV1<'static>, Option<Cart>>`
= note: required because it appears within the type `DataPayload<CollationDataV1Marker>`
= note: required because it appears within the type `Collator`
= note: required for `&Collator` to implement `Send`
note: required because it's used within this closure
--> src\main.rs:64:47
|
64 | for entry in WalkDir::new("test").sort_by(|a, b| {
| ^^^^^^
note: required by a bound in `WalkDir::sort_by`
--> C:\Users\trevisal\.cargo\registry\src\index.crates.io-6f17d22bba15001f\walkdir-2.3.3\src\lib.rs:396:54
|
396 | F: FnMut(&DirEntry, &DirEntry) -> Ordering + Send + Sync + 'static,
| ^^^^ required by this bound in `WalkDir::sort_by`
For more information about this error, try `rustc --explain E0277`.
Can you please give me some advice on how to make it work?
Not to be too nitpicky but maybe this should be
sort_by_file_name_lowercase
? Unicode make a distinction between case folding and lowercasing. From the Unicode specification (3.13 Default Case Algorithms):