Closed tmfink closed 1 year ago
Also, this is a breaking change that will require a minor version bump
Yes, I also thought about that when porting that library, but the original implementation behaves like this: https://github.com/WaniKani/WanaKana/blob/master/src/isKanji.js#L21
I think the reasoning behind this is, that you want to use them to know if you should start some conversion, which is not the case for empty strings, or maybe you have a list of strings, where you would want to know if they contain Japanese, e.g.
let words = ["hello", "", "some", "text"];
let japanese_word_detected = words.iter().some(|word| is_japanese(word));
I recommend we break with the original API. At least in the isKanji()
example you linked to, there is no documentation about the behavior for empty input. Users would need to read the code know the expected behavior.
We can bring this issue up with the original implementation as well. The behavior is opposite of what is expected.
In either case, we should clearly document the behavior/convention used in the crate.
I think it's often the case where you want to check to the existence of japanese/kanji characters to switch to a special behaviour, like in the example I provided. Do you have a counter-example, where you would want the empty=japanese behaviour?
You set analogy makes sense, but as an counterexample, since this is not stricly a set, but also a natural language field: If someone doesn't speak, does he speak japanese? ;)
I agree the behaviour should be well documented.
I think it's often the case where you want to check to the existence of japanese/kanji characters to switch to a special behaviour, like in the example I provided. Do you have a counter-example, where you would want the empty=japanese behaviour?
Sorry for the late reply. Here are some examples:
1) Given a Japanese string, should all substrings be considered Japanese (including ""
). The expected behavior is yes.
2) If you append the empty string to a Japanese string, the result will be a Japanese string. It would be strange if empty string was not considered Japanese, since that would mean appending a non-Japanese would preserve the Japanese property.
Sorry for the late reply. Here are some examples:
1. Given a Japanese string, should all substrings be considered Japanese (including `""`). The expected behavior is yes. 2. If you append the empty string to a Japanese string, the result will be a Japanese string. It would be strange if empty string was not considered Japanese, since that would mean appending a non-Japanese would preserve the Japanese property.
Sorry for the late reply. The examples assume that you want strict set behaviour. I meant a use case where you want strict set behaviour.
The examples assume that you want strict set behaviour. I meant a use case where you want strict set behaviour.
I would expect strict set behavior for any function that sounds like a character property applied to any/all characters of a string:
For example, is_hiragana()
and is_hiragana()
applied to collections of characters (&str
in this case).
I would expect implementation to be something like:
fn is_hiragana_str(s: &str) -> bool {
s.chars().all(is_hiragana_char) // Iterator::all() follows set semantics
}
In contrast, a function like is_mixed()
does not fit into this case. There is no "mixed" property that applies to an individual character. Instead, it depends on several character properties and whether more than one "class" of characters is represented.
I would expect strict set behavior for any function that sounds like a character property applied to any/all characters of a string:
* "this property holds for ALL characters" * "this property holds for ANY character"
But this is the case, the issue is only the empty input, which is not a character.
The mathematical convention is that for any element property, empty sets satisfy the property (because all elements satisfy the property). This is called "vacuous truth".
This is consistent with the behavior of
Iterator::all()
andIterator::any()
.For example, we will treat:
Without this change, we would have the behavior that adding characters to a string could unintuitively change a property from false -> true -> false:
Links: