Exercises without canonical data

iHiD commented 4 years ago

Taking a look here - every PR that doesn't have Remove version (#1678) as the most recent commit is presumably missing canoncial data?

Is this expected/bad/etc? How do we sort it?

petertseng commented 4 years ago

It is true that some exercises do not currently have canonical data. The top-level issue that asks for canonical data for each exercise is https://github.com/exercism/problem-specifications/issues/552, with linked issues for each individual exercise. Among those linked issues, those that remain open presumably correspond 1-to-1 with the exercises that still don't have canonical data (but I did not verify that).

I think most are agreed that it would be good to have it for all exercises, but that it has proven tricky to figure out how to express some exercises' tests as canonical data. bank-account, anyone?

If we wanted to enforce it for all newly-added exercises going forward, I am sure something could be arranged in https://github.com/exercism/problem-specifications/blob/master/bin/check_required_files_present - there would need to be a TODO-list that allows certain exercises to not have canonical data, but that TODO-list would strictly only be removed from over time.

Now I'm not arguing for or against the idea that "all exercises should have canonical data" should be prioritised, simply saying how things might proceed if it does get prioritised.

ErikSchierboom commented 4 years ago

I think most are agreed that it would be good to have it for all exercises, but that it has proven tricky to figure out how to express some exercises' tests as canonical data. bank-account, anyone?

Yes, this is also how I remember things. IIRC we were experimenting with having some tests not conform to the "this is the exact value we expect" but instead describe what to expect (more like a property or invariant). One example of this is the canonical data for the diffie-hellman exercise, which has the following test case:

{
  "uuid": "68b2a5f7-7755-44c3-97b2-d28d21f014a9",
  "description": "private key is random",
  "property": "privateKeyIsRandom",
  "input": {},
  "expected": {
  "random": true
  }
}

Another example is the canonical data for the dnd-character exercise, which has the following test case:

{
  "uuid": "385d7e72-864f-4e88-8279-81a7d75b04ad",
  "description": "random character is valid",
  "property": "character",
  "input": {},
  "expected": {
    "strength": "strength >= 3 && strength <= 18",
    "dexterity": "dexterity >= 3 && dexterity <= 18",
    "constitution": "constitution >= 3 && constitution <= 18",
    "intelligence": "intelligence >= 3 && intelligence <= 18",
    "wisdom": "wisdom >= 3 && wisdom <= 18",
    "charisma": "charisma >= 3 && charisma <= 18",
    "hitpoints": "hitpoints == 10 + modifier(constitution)"
  }
}

The "hitpoints": "hitpoints == 10 + modifier(constitution)" is basically an encoding of the actual rule, and not an exact value.

The clear disadvantage of these more descriptive expected values is that it requires more effort on the part of test generator to deal with. That said, I don't think it is that much of an issue, as it is requires a one-time investment of time. Furthermore, as we now have immutable tests writing it once means that it will work forever, no risk of regression bugs.

So all in all I think we probably can define canonical data for all exercises, but it will require some thought into how to encode them. I'm sure the community would love discussing the options we have :)

exercism / problem-specifications

Exercises without canonical data #1684