exercism / problem-specifications

Shared metadata for exercism exercises.
MIT License
325 stars 542 forks source link

pangram: missing case-sensitivity edge case #266

Closed sdavids13 closed 7 years ago

sdavids13 commented 8 years ago

Check out: https://github.com/exercism/xkotlin/pull/10

The implementation that passed all existing tests didn't need to lowercase the input and was able to double count the same upper-case and lower-case characters. A new test was implemented to swap out dog in the quick brown fox jumped over the lazy dog with FOX which catches the error after asserting it's not a pangram.

edit: Test case to add: the quick brown fox jumped over the lazy FOX

kytrinyx commented 8 years ago

I'm having trouble parsing the above. What is the input? the quick brown fox jumped over the lazy fox?

kytrinyx commented 8 years ago

Nevermind, I went back to the original issue. Yes, that is the problem.

It looks like there might actually be two possible test cases that we've not covered:

  1. the quick brown fox jumped over the lazy fox
  2. the quick brown fox jumped over the lazy dog fox

In the second it's not a pangram because we have duplicate letters.

sdavids13 commented 8 years ago

A pangram (Greek: παν γράμμα, pan gramma, "every letter") is a sentence using every letter of the alphabet at least once.

I believe your second example is a pangram and is covered by other tests with duplicate letters.

sdavids13 commented 8 years ago

Oh, and to clarify, your example 1 needs to have one of the FOXs uppercased.

Cohen-Carlisle commented 8 years ago

As I understand it, we aren't catching faulty implementations that use case sensitive letter counts that depend on there being 26 or more letter characters. If it is asserted that exactly 26 letter characters exist (again, when case sensitive), then the following test will fail:

{
  "description": "pangram with non ascii characters",
  "input": "Victor jagt zwölf Boxkämpfer quer über den großen Sylter Deich.",
  "expected": true
}

That being the case, the quick brown fox jumped over the lazy FOX would indeed catch this. So would The quick brown frog jumped over the lazy dog, which I kinda like :smile:

kytrinyx commented 8 years ago

Ah, I thought that pangrams were sentences that used each letter exactly once, not at least once.

The definition of the alphabet would presumably differ depending on which language the example is in. A Norwegian sentence would be required to have all of the same letters as in English, plus æ, ø, and å. I don't know enough German to know whether they view ß as a separate letter, or just a symbol that represents ss. In French, there is only one letter e, but it can occur with a number of accents, (é, è, ê, ë). From the above definition of pangram, a French text that had only e, whichever accent, would qualify as a pangram.

An alphabet seems to be case insensitive, i.e. a and A represent the same letter.

In light of this, I would suggest that we clarify the requirements:

I don't know what to suggest regarding accents. People are named André and Zoë in English-speaking countries, and this would count as an e, I believe.

It's very tricky in some languages, though, to deal with detecting all the variants of a given letter, so this might be stretching the exercise beyond what is useful.

NobbZ commented 8 years ago

I do not like the term "english" letters, since they are actually "latin". Just say “case insensitive A-Z are mandatory, other characters of any kind may appear in the string”. This phrasing does allow punctuation, diacretic characters, kyrillics and all that stuff while still enforcing the necessity to have at least one of each out of A-Z and also it does not lye about the origin about our alphabet.

kotp commented 8 years ago

And due to v w and y use, I think they are Roman letters.

Is this coming about because we don't provide the "alphabet" up front in the Readme? If we give the letters that are to be included as measurable for the pangram, state that they are to count in either uppercase or lowercase, this becomes clear.

The current states:

Determine if a sentence is a pangram. A pangram (Greek: παν γράμμα, pan gramma, "every letter") is a sentence using every letter of the alphabet at least once. The best known English pangram is "The quick brown fox jumps over the lazy dog."

It could be be that we add the alphabet after the example "A quick brown fox jumps over the lazy dog."

Also, the fact that there is a period in the text which allows for more than just what is in the alphabet, and a capital letter should hint that more than one match is appropriate, in regards to case, and more than one occurrence of a letter is OK as well, due to more than one (four, even) "o" for example.

Perhaps the only confusion is the alphabet, and this is perhaps brought on in part due to the German example.

Another solution is to add the stipulation that the solution can use a "user supplied" alphabet. Still, this provides more complexity, and more potential confusion. I don't think that is the correct way to go.

kytrinyx commented 8 years ago

I do not like the term "english" letters, since they are actually "latin".

Sorry, you are right. The letters are latin.

What I'm trying to say is that I do not think that it is possible to correctly define a pangram without knowing which language the pangram is in.

If you teach a Norwegian child the alphabet in Norwegian, they do not learn the same alphabet as an American child learning the alphabet in English.

Even though both languages use a latin alphabet, the English and Norwegian alphabets are not identical.

I don't know how to make this distinction clear.

Another solution is to add the stipulation that the solution can use a "user supplied alphabet. Still, this provides more complexity, and more potential confusion. I don't think that is the correct way to go.

Agreed.

kotp commented 8 years ago

exercism/x-common#284 hopefully clarifies this.

behrtam commented 7 years ago

It was good to clarify the used alphabet but the edge case proposed by @sdavids13 still makes sense to be added.

Insti commented 7 years ago

I think it's a good idea too, @behrtam do you want to submit a PR with this new test case?

General info for anyone reading this thread in future: There is a discussion about a general policy for characters/character sets to test for in: https://github.com/exercism/x-common/issues/428

b10s commented 4 years ago

Hi @sdavids13 , @rbasso , @kotp

sorry for digging it.

I have a question about test case. It's name is "case insensitive" but I can see each alphabet letters in lower case presented here:

the quick brown fox jumps over with lazy FX

It means a solution which cares only about lower case, i.e. case sensitive solution, will pass.

Is so, then what the test case is for?

kotp commented 4 years ago

Yes, it is an odd description, perhaps it needs a touch... Do you want to make a pull request?

Though it is not a pangram, even excluding the capital letters, there is no "d". So the title of the test is likely wrong.

As a "case insensitive non-pangram" it should still be expected that it results in false.

The test right above it also does not need the quoted string, but should have the mixed case and the period, and should still result in true.