Meta: purpose / goals of canonical data

exercism / problem-specifications

Shared metadata for exercism exercises.

MIT License

325 stars 542 forks source link

Meta: purpose / goals of canonical data #1553

Closed yawpitch closed 5 years ago

yawpitch commented 5 years ago

This is meant to continue a portion of the discussion initiated in #1551. That PR got rather contentious because of apparent differences in understanding / agenda surrounding the problem-specifications in general and the canonical-data.json in specific. Hopefully this jumping off point will lead to more fruitful discussion and resolve some of the contention.

The Background

The resistor-color-trio exercise describes, broadly, a trivial transformation problem: a list of three strings ["orange", "orange", "orange"] must ultimately become some string that minimally includes the term "33 kiloohms". There was some healthy debate on what, precisely, that final string should be, but ultimately rough consensus was arrived at for, simply, "33 kiloohms", so we'll consider that point settled.

Any solution to this problem must:

transform each color to its corresponding numerical value
transform the numerical values into a resistor value in ohms (ie 33000)
transform the resistor value in ohms to a humanized string (ie "33 kiloohms")

The Perspectives

There appear to be, broadly, two schools of thought about the intent of canonical data, though I'm all too happy to admit I might be summarizing one or both sides incorrectly, as the conversation was long and branching.

Test the Units

This perspective sees the canonical data as modelling a library and sees each of those three steps as a discrete unit and thus a potential property to be included in the canonical data. In many (if not most) languages Point 1 is going to be provided by a language primitive, and thus is reasonably excluded as a property to test, but Point 2 and Point 3 are not.

In this perspective then the canonical data would include two properties, one that converted the list of strings to an integer and the second that converted that integer to a label for storage or display.

Note that this perspective ignores two considerations:

The intended or assumed didactic value of this exercise -- this perspective leaves that up to the tracks... but this doesn't help when the exercise's goal is to fill a gap revealed by Track Anatomy.
The prior art in a series or track -- this perspective doesn't assume work done in previous exercises... could lead to recapitulation of essentially identical work.
"Bake-in" of an "ideal" solution -- may not let the student explore enough?

Test the Whole

This perspective sees the the canonical data as modelling an application: since Point 1 is a functional necessity of Point 2, and Point 2 is a functional necessity of Point 3, an integration test suite that adequately exercises Point 3 will sufficient to test a solution, therefore Point 3 is the only property the canonical data needs to describe.

Thus the canonical data would include just one property that converts a list of strings directly to a label for storage or display.

Note that this perspective ignores different considerations:

The author of the canonical data's inherent bias; can anyone know the didactic value of an exercise to a plurality of tracks?
There may be no prior art: a track may forego any or all of the prior parts of the series, so assumptions about what the student has been exposed to may be false.
It doesn't model TDD or good engineering practice for the students -- hands the responsibility and burden for explaining this off to the mentors instead of just modelling it up front.

Points of Discussion

Does this even need discussion? "You don't have to implement the exercise in your track" seems to be a common refrain from those who don't think so. Is that sufficient?
If it does merit discussion, how do we harmonize those two perspectives?

Significant canonical data changes can lead to a lot of maintenance work -- especially in tracks that lack any form of automatic generation of test suites -- and also a fair amount of disruption for both mentors -- who have to be familiar with the most recent test suite -- and students -- who on an anecdotally frequent basis end up with a test suite that doesn't match the one they started their exercise implementation on, so personally I think it's worth the discussion.

SleeplessByte commented 5 years ago

Thank you for opening this here and branching it out. I really do appreciate it 💟.

My personal opinion on the matter as a whole is:

less public API is better
only test public API (and usually that means only test integration, not implementation)
don't assume the student/implementer is stupid
don't assume the student/implementer knows everything
have room for making mistakes (read: green tests, red implementation) because those are usually the most valuable learning tools

In that regard:

The exercise which is taken as an example had a single input (array of colors) and single output (label describing 33 kiloohms). If you take the list above:

It only exposes 1 function/property
It only tests the integration as a whole, leaves the implementation in question
It doesn't assume the student will need handholding by strictly defining all the properties to compose this label
It doesn't assume the student knows everything as there is a description in place and various test cases to test the integration
There is room for making mistakes: weird branching, not composing the solution out of multiple functions, not using nice data types internally, not naming things rightly.

yawpitch commented 5 years ago

@SleeplessByte my top of the post discussion was truncated when you saw this -- damned keyboard shortcuts -- you may want to read it now.

SleeplessByte commented 5 years ago

@yawpitch It seems that I mostly have given another phrasing / elaboration on the "testing as a whole", correct?

Given the "ignored considerations" of both methods: what I've been longing for, for a loooong time is at least have the intent of the exercise, or what subjects they try to convey listed. That would probably solve some of the issues we're having, regardless of the testing method!

Does this even need discussion? "You don't have to implement the exercise in your track" seems to be a common refrain from those who don't think so. Is that sufficient?

I think we do, because we end up with having these long discussions on each and every PR. If we can all get onto one page, we can merge more swiftly, methinks 😄

If it does merit discussion, how do we harmonize those two perspectives?

I don't know if we really can. I think that units as in private functions (so implementation) are inherently not capable of co-existing with units as public api (so integration), because once you add a test for a single implementation, you break the "integration-only rule" (whilst the reverse isn't true).

yawpitch commented 5 years ago

@SleeplessByte: to your first post.

* less public API is better

There's no controversy on this point; the two perspectives mainly disagree on what the public API of an exercise that has a math component and a string interpolation component would be. To me, and others, on the unit test side this exercise naturally has two public functions to export.

* only test public API (and usually that means only test integration, not implementation)

Agreed on testing the public API, though I'd say I disagree that unit tests of a 2 item library's public API are somehow focusing or defining implementation and more than integration tests on a 1 item app.

* don't assume the student/implementer is stupid
* don't assume the student/implementer knows everything

The unit test perspective assumes neither of those things. It only assumes that the problem has two work components that ultimately both need testing.

* have room for making mistakes (read: green tests, red implementation) because _those are usually the most valuable learning tools_

The unit test perspective leaves plenty of room for error: the only thing being defined by having a property per unit is the name, signatures, and return types of those units. All of which are being tested by the integration test, just weakly and by proxy.

coriolinus commented 5 years ago

I find myself falling strongly into the "test the units" camp, which probably surprises nobody.

I find "you don't have to implement this exercise in your track" a hostile answer, condescending, intended to close off discussion instead of choosing the better of two irreconcilable options. From my perspective, the stated intention https://github.com/exercism/problem-specifications/pull/1551 seems like a reasonable thing to include for new users, and I think it could be a valuable addition to my track. However, I do not want to manually rewrite a potentially large number of tests for the exercise; I want to run the test generator on the canonical data and be done with it. Therefore, I place high inherent value on getting the canonical data right.

Strawman: if we assume that all track maintainers have the same perspective that I do, then we should favor the "test the units" perspective for canonical data over the "test the whole". This is because it is easier for "test the whole" track maintainers to remove tests which are in their view extraneous, than for "test the unit" track maintainers to insert new tests.

It feels to me like much of the divergence of opinions comes from disagreement over whether exercises are in fact libraries or applications. In this matter, I am biased by my language: every exercise in the Rust track is explicitly a library; there would be visible changes in the exercise design if they were to be applications. It is clear that there may exist tracks for which libraries are more difficult to implement than applications. Bash springs to mind, though I haven't participated in that track and do not assert that it is definitely an example of such a track. The point I am making here is that the natural answer to the question "are exercises libraries or applications" may vary based on the nature of the track in question.

I agree with @SleeplessByte that we need to have this discussion, and that we need to resolve it, because the two perspectives seem irreconcilable.

yawpitch commented 5 years ago

@yawpitch It seems that I mostly have given another phrasing / elaboration on the "testing as a whole", correct?

Yes, you're strongly on that side of the debate.

Given the "ignored considerations" of both methods: what I've been longing for, for a loooong time is at least have the intent of the exercise, or what subjects they try to convey listed. That would probably solve some of the issues we're having, regardless of the testing method!

I too have that longing, however I'm not sure we can state, reasonably well, what those intents are for even a majority of tracks. For most non-trivial exercises there are multiple possible intents, especially if considered from a functional vs imperative vs procedural paradigm. Your intent for a given exercise might very well not mesh at all with say Lisp or Bash or VimScript's take on the exact same exercise, and "just don't implement it" is a poor response because the exercise may very well have very valuable things to bring to that other perspective, but made difficult by "your" track's intent having been baked, hard, into the canonical data instead of into your track's test suite.

I don't know if we really can. I think that units as in private functions (so implementation) are inherently not capable of co-existing with units as public api (so integration), because once you add a test for a single implementation, you break the "integration-only rule" (whilst the reverse isn't true).

On that point I think you've got a more narrow idea of what unit tests are than I do. Not every private helper function needs a unit test described in the canonical data, just the discrete units of work in the exercise. Looking at resistor-color-trio I see an exercise whose main work falls into two units; calculating the value (the important part, from a library perspective) and producing the label (the important part from an app perspective, the fluff from a library perspective).

Also you haven't yet established that "integration-only" constitutes a rule, so we can't break it. If properties are units of public work -- leaving plenty of room for implementation details -- then there's a lot of overlap between our perspectives. If we decide, a priori, that a unit test means testing every single specific implementation detail of the one true implementation then of course your perspective looks justified ... but that's straw man we on the other side are not speaking about.

coriolinus commented 5 years ago

I think we all agree with this assertion:

Any solution to this problem must:

transform each color to its corresponding numerical value

transform the numerical values into a resistor value in ohms (ie 33000)

transform the resistor value in ohms to a humanized string (ie "33 kiloohms")

I think we would all agree with this assertion:

The canonical data should be as simple as possible

The point of disagreement is about what constitutes the simplest possible implementation of the canonical data. @SleeplessByte asserts the following:

less public API is better

I would restate this as: "the API should be factored into minimal units".

Anecdotally, I have had the misfortune in my professional life to use libraries which, by analogy, provided only the method which accepted a list of color names and emitted a string label. They caused me extra work, because I needed to do work with the integer; the design forced me to parse the integer from the string, which was wasted effort. The implementation must have had internal code which produced the integer, but it was not made public.

Again, this is analogy, not a literal truth, but I have had this experience. I am certainly not the only one among us who has.

I desire to engage in pedagogy which reduces the possibility of future library authors ever writing badly factored libraries like that, to reduce the chance that I ever have to deal with those badly factored libraries again. My company has spent literally thousands of dollars of developer-hours working backwards from badly factored libraries which, due to unique capabilities, are business-critical. The best way I know to reduce the probability of badly factored libraries in the future is to teach people to write well-factored libraries from the start.

yawpitch commented 5 years ago

I find myself falling strongly into the "test the units" camp, which probably surprises nobody.

Indeed, not surprised. :-)

I find "you don't have to implement this exercise in your track" a hostile answer, condescending, intended to close off discussion instead of choosing the better of two irreconcilable options.

That is how I feel when I see that response as well. I try to resist that feeling, as I assume no one is trying to be hostile, but it does tend to end a conversation about the canonical data form prematurely.

I want to run the test generator on the canonical data and be done with it. Therefore, I place high inherent value on getting the canonical data right.

Same here, because I'm trying to make the Python test generator and that is not easy Python, especially when the canonical data is not right.

In this matter, I am biased by my language: every exercise in the Rust track is explicitly a library; there would be visible changes in the exercise design if they were to be applications.

The Python track is similarly constructed; everything is a library, and that's a design I feel strongly we should continue.

Bash springs to mind, though I haven't participated in that track and do not assert that it is definitely an example of such a track. The point I am making here is that the natural answer to the question "are exercises libraries or applications" may vary based on the nature of the track in question.

Veteran of the Bash track here; every file there is an application, and almost by definition has to be. They're pretty well constructed exercises though, and where there's an exercise with multiple properties the application has a CLI interface that passes which of the properties to run, so you'd expect thing like ./resistor-color-trio value orange orange orange or ./resistor-color-trio label orange orange orange. It's quite natural.

I agree with @SleeplessByte that we need to have this discussion, and that we need to resolve it, because the two perspectives seem irreconcilable.

They're not irreconcilable, we just need to better air, and understand, the needs each side is trying to address. This is made more complex and endeavour by the many different languages and paradigms. I keep feeling like these long drawn out conversations are important to sussing out that commonality, which is why I don't react well to attempts to prematurely shut them down.

yawpitch commented 5 years ago

My company has spent literally thousands of dollars of developer-hours working backwards from badly factored libraries which, due to unique capabilities, are business-critical. The best way I know to reduce the probability of badly factored libraries in the future is to teach people to write well-factored libraries from the start.

In my case it's many millions of dollars. I have a feeling watching a Breaking Bad level pallet of cash being essentially set alight because someone thought there was no reason not to translate an image into some essential metadata in a database via a fifteen foot long single line of heavily obfuscated perl will put anyone on the test the units side.

Look guys it's nearly midnight, I'm off for the night. Please continue and I hope others chime in, as I really do think this could be helpful in better defining what should be in the canonical data, which hopefully will give us a better idea what should be left to the tracks.

SleeplessByte commented 5 years ago

Any solution to this problem must:

transform each color to its corresponding numerical value transform the numerical values into a resistor value in ohms (ie 33000) transform the resistor value in ohms to a humanized string (ie "33 kiloohms")

I don't agree. Yes this is probably how I'd like it to be implemented, but MUST states there is no other way. I don't even think SHOULD is correct. For example: in prolog I rather implement this with a matrix than an algorithm. Or a combination. Even if all bands could all have 9 colours, it would "only" be 531.441 entries. I can see solutions like that.

Other solutions might:

transform 2 bands to a numerical value
transform the third band to a multiplier
transform into a total (number)
transform into a string

Or if the language permits:

Create an object that represents ohms
transform the two bands into such object
multiple it with the numerical value of the multiplier of the third band
transform into a string

re: @coriolinus

To me, and others, on the unit test side this exercise naturally has two public functions to export.

This sound like one of the core issues actually. I had not considered it explicitly yet.

I find "you don't have to implement this exercise in your track" a hostile answer, condescending, intended to close off discussion instead of choosing the better of two irreconcilable options.

It's meant to move forward; so please don't assume the intention. If you assume it as hostility, then the battle is already lost because you'll likely to read everything in a hostile manner. I simply meant that "if a dataset doesn't work with the mantra of the language, don't implement it". Maud even suggested designing a version especially for you, where these concerns are separated! (resistor-color-four for example would be a prime candidate).

[...] reasonable thing to include for new users [...]

Yes, I agree with you in that context. As I stated, the exercise is meant to be placed somewhere on level 2 or 3 in the Track Anatomy project. I apologise for this not being more clear. I don't know if this can be solved in the short-term. I'm not the authority on that project, but I chat with Maud on a weekly basis which gives me insider-knowledge. However, once that's said, I don't quite understand why it's criticised or put under scrutiny.

However, I do not want to manually rewrite a potentially large number of tests for the exercise; I want to run the test generator on the canonical data and be done with it. Therefore, I place high inherent value on getting the canonical data right.

This I understand wholeheartedly. Almost no-one likes work that might be easy to avoid. But that also means that you'll bring this bias "I want unit-tests" to the canonical data discussions. I think that's something that we shouldn't forget. It's not wrong, I think, but it's also not unimportant.

It feels to me like much of the divergence of opinions comes from disagreement over whether exercises are in fact libraries or applications.

Yes, I think you're absolutely right here. This is one of the core issues. And I tend to agree with your assertion: "it probably depends"! I don't know how to solve for that specifically 😓

re: @yawpitch

I too have that longing, however I'm not sure we can state, reasonably well, what those intents are for even a majority of tracks.

Strong disagree. Just because the intent doesn't translate to all tracks doesn't mean an exercise wasn't designed according to a specific set of intentions/goals/track anatomy placement. I think it will actually help decide if a track should have an exercise or not. My personal future would be having more exercises that specialise more on various subjects and having them implemented in less tracks! (Not a necessity, but I think that would be our future).

For most non-trivial exercises there are multiple possible intents, especially if considered from a functional vs imperative vs procedural paradigm.

Ah yes, this is also a pretty important one: I would like to see specialised exercises for currying, functions as first class citizens, declarative code, etc. These would most likely not even be possible to implement in all tracks!

made difficult by "your" track's intent having been baked, hard, into the canonical data instead of into your track's test suite.

To be fair: most of the comments as made by you and Peter in the other thread mostly work for your tracks and not necessarily of "ours". re: description that takes ohms as a parameter. I don't think this is a good argument in either of our camps. I think we all want things to work out better for "our" tracks and I think this is the game: find something that is the least resistance in the most number of tracks 🚀

"just don't implement it" is a poor response because the exercise may very well have very valuable things to bring to that other perspective,

That's not what I said. I said that a track doesn't need to implement it. See my other response about this above.

On that point I think you've got a more narrow idea of what unit tests are than I do. calculating the value (the important part, from a library perspective) and producing the label (the important part from an app perspective, the fluff from a library perspective).

Let's call it different instead of "narrow"? A unit of work here is also something that isn't really objectively defined. But let's take your approach and see this exercise as a library. As Erik and Maud described it, there is no value property. That's something you and Peter came up with in the discussion. That's of course perfectly fine, but in the original exercise, there was no such thing.

I can come up with solutions that involve not single integer or solutions that don't generate an intermediary value before they have ohms. There are languages, just like Julia, where ohms are there own defined type, and there are languages that have great ways of describing units, all of which might or might not involve numbers at all.

If we add the notion of value, then we might as well export it. I agree with that. If it's exported, it should be tested. We agree on that. To me you could define these as units of work:

the value of a band given it's in the first or second position
the value of a band given it's in the third position
(OR: the value of a band given any position)
the value of a transistor given three bands
the humanised label given three bands

The original idea of the exercise was to only export the latter. You're opting to export the bottom two. Someone else might come in and say "yeah but those first two units are also units of work". The point there is that if the exercise is "generate a label", then all these other things are implementation details.

Note: I'm not disagreeing with testing, or agreeing with it. I'm just explaining that what you're saying is still cutting the work up on an arbitrary boundary you feel comfortable with.

Also you haven't yet established that "integration-only" constitutes a rule, so we can't break it. If properties are units of public work -- leaving plenty of room for implementation details -- then there's a lot of overlap between our perspectives. If we decide, a priori, that a unit test means testing every single specific implementation detail of the one true implementation then of course your perspective looks justified ... but that's straw man we on the other side are not speaking about.

I don't need to establish it. In CONTRIBUTING.md it states specifically that: "We try to keep the generic descriptions generic--we avoid implementation-specific examples, and try not to be too prescriptive about suggesting how a problem might be solved.". This mostly translates to normalisation of the canonical data (I think?). In the past, so there is precedence, we've always removed/rejected tests that tested specific implementation. For example: we have removed tests in list-ops that tested if the list was of type array or similar, because that was testing implementation. We have rejected tests in robot-name that tested that Random was being used correctly (and instead test that all the names are present).

Properties are therefore units of public work. We try to have less properties because it keeps it more generic, but that's not a rule. That's just what I say 😉.

re: @coriolinus

Again, this is analogy, not a literal truth, but I have had this experience. I am certainly not the only one among us who has.

You and me both. I feel your pain. This has happened often to me!

I desire to engage in pedagogy which reduces the possibility of future library authors ever writing badly factored libraries like that, to reduce the chance that I ever have to deal with those badly factored libraries again.

I admire this :heart: thank you. And I mean that.

The best way I know to reduce the probability of badly factored libraries in the future is to teach people to write well-factored libraries from the start.

Absolutely agree. But I don't know if the canonical data, in this case, is the place for that!

petertseng commented 5 years ago

I tried to sketch some misguided schemes for the canonical-data that admit multiple philosophies, but I didn't find them satisfactory. I'll post them and mark my post as off-topic, as it is useful to log failed attempts so that others know what has already been tried.

Attempt 1: The canonical data expresses the inputs/outputs in the form that is easiest to translate to all other philosophies.

Attempt 2: The canonical data contains the union of all data needed to translate to all philosophies.

petertseng commented 5 years ago

Attempt 1: The canonical data expresses the inputs/outputs in the form that is easiest to translate to all other philosophies.

Attempt 1, as applied to resistors: It is easier to transform an integer to a string than to transform a string into an integer. As such, canonical data contains tests for Array[String] -> Integer and Integer -> String. Tracks that wish to test those two functions separately use the data as-is. Tracks that wish to test an Array[String] -> String function instead do one of two things:

Stitch the data together internally (possible if every output for the Array[String] -> Integer is an input for the Integer -> String function)
Ignore all data for the Integer -> String function. Instead, use an implementation of the Integer -> String function to transform all the expected Integer outputs of the Array[String] -> Integer function into Strings, thus getting an Array[String] -> String function.

I didn't find this solution satisfactory.

Inevitably, it will be easier to use one philosophy over the other; for at least one philosophy it is the case that the canonical data must be transformed before becoming the test suite. Even if this were not intrinsically bad, it leads to the below problem
Each implementing track that chooses to use a philosophy that requires transformation seems to be obligated to implement the transformation in a track-specific manner. There is less sharing, and another layer of error is introduced. The translation may be non-trivial and therefore requires review and maintenance and assurance that it is correct. For the specific example, what if I accidentally gave kilo a multiplier of 10000 instead of 1000?

Attempt 2: The canonical data contains the union of all data needed to translate to all philosophies.

Attempt 2, as applied to resistors: The data would look something like:

    {
      "description": "Green and brown and orange",
      "property": "resistance",
      "input": {
        "colors": ["green", "brown", "orange"]
      },
      "expected": {
        "ohms": 51000,
        "formatted": "51 kiloohms"
      }
    },

Tracks that wish to test Array[String] -> Integer and Integer -> String translate this as two tests: one that checks that resistance(colors) = ohms and one that checks that format(ohms) = formatted. Tracks that wish to test Array[String] -> String translate this as one test that ignores ohms and just tests that resistance(colors) = formatted

I didn't find this solution satisfactory.

For all philosophies it is the case that the canonical data must be transformed before becoming the test suite. Even if this were not intrinsically bad, it leads to the below problem.
(Same problem as for attempt 1) Each implementing track seems to be obligated to implement the transformation in a track-specific manner. There is less sharing, and another layer of error is introduced. The translation may be non-trivial and therefore requires review and maintenance and assurance that it is correct.

coriolinus commented 5 years ago

Any solution to this problem must: ...

I don't agree. Yes this is probably how I'd like it to be implemented, but MUST states there is no other way.

Granted, it's possible to factor the internal implementation details differently, but nobody's proposing we test those. The intent here is to test the essential unit of work, which in the example case is to translate a list of colors into an integer.

I don't know very much Prolog, but I do not believe that you think students should type out half a million lines of bare pattern matching to solve this exercise. If there is some metaprogramming involved, couldn't that be trivially adapted to writing a function which converts lists of colors into an integer?

A unit of work here is also something that isn't really objectively defined. But let's take your approach and see this exercise as a library. As Erik and Maud described it, there is no value property. That's something you and Peter came up with in the discussion.

Agree that a unit of work is not objectively defined. Agree that in the original description there was no value property.

However, when viewing the exercise as a library, the first instinctive thought was "this is not the best factorization". Breaking out and differentiating value from label had no obvious downside. When I first suggested it, I imagined it would be received as an obvious improvement with no opposition. Who would oppose better factorizing a library, particularly when the change was this simple?

The best way I know to reduce the probability of badly factored libraries in the future is to teach people to write well-factored libraries from the start.

Absolutely agree. But I don't know if the canonical data, in this case, is the place for that!

Do we agree that this exercise is meant to be placed early enough in the track that students are not likely to find the ideal factorization on their own? If yes, then we're back at an earlier discussion point: if an exercise is an application, it doesn't matter what its internal factorization is. If it's a library, we all agree that good factorization is important.

Let's try this argument: the canonical data for exercises should assume that exercises are libraries and encourage good factorization, because it's like Pascal's Wager: requiring good factorization can never make student code worse, but has several positive benefits, ranging from saving mentor time to reducing the chance that the student will write a badly factored library in the future.

ErikSchierboom commented 5 years ago

Unsurprisingly, I'm firmly in the "test the whole" camp. @SleeplessByte's arguments largely align with mine, but I'll restate some of my arguments and address some of the considerations of this approach.

Let's start by noting that the canonical data are a means to an end, namely to help the students have the best possible experience using Exercism. A next logical step is to check what the goal of Exercism itself is. We can find this at the values page:

Mission Statement

To enable anyone to achieve fluency in any programming language for free, in order to give opportunity to all and improve the quality of software development worldwide.

The key part here is fluency (emphasis mine). You can find a slightly more detailed description of fluency and its goals in the goal-of-exercism document. There too, the emphasis is on teaching fluency, so I think we can agree that our ultimate goal is to teach fluency to a student.

As canonical data is there to support the student/website, its ultimate goal should thus also be to help teach fluency to the student. There are of course many other valid considerations when designing canonical data, but to me they are all trumped by the fluency consideration. This does not mean that we should not look at other considerations (we should!), it's just that IMHO the other consideration should always be valued less than the above consideration. As an example, consider the ease with which canonical data can automatically be converted to a test suite. We should of course strive to make things easy for maintainers and test generators, but it should not be the main point of focus.

Regarding the application vs. library approach, personally I feel that the canonical data should be defined using the bare minimal API. In other words, I'd like students to have a very minimal implementation surface. This means giving the students the least amount of methods to implement. Yes this can mean that students don't use separate methods with a single responsibility, which is then something a (automated mentor can comment on.

I understand the desire to "lead" the students towards a more correct solution, but this might not be the best learning experience as it restricts the number of possible solutions, making the student having to think less. Learning from mistakes is a very powerful tool! Of course, there are mentoring considerations too, as we don't want to overwhelm mentors. For this, the track anatomy project can help, as the maintainer can then structure the exercises logically to make transitions logical.

By the way, there have been several exercises that used to have several properties (and thus forced a particular solution on a student), but for which the additional properties were removed in favor of a single property. If needed, I can do some digging to find these, but the consensus there was that we shouldn't be testing for implementation details, hence the removal of the additional properties.

Now on to the considerations that were listed for the "test the whole" camp:

The author of the canonical data's inherent bias; can anyone know the didactic value of an exercise to a plurality of tracks?

This sounds almost like an existential question 😃 Of course, there is an inherent bias of an author, but that applies just as well to the library approach. Who decides what makes for a correct separation of concerns in the library approach? The author of course. What may be a good design in one language, might not be a good design in another language. This is why it sometimes makes sense not to implement an exercise in a track. As an example, the flatten-array exercise does not make for a great exercise in F#, but it still is quite useful for many other language tracks. An exercise is often designed to teach one or two very specific concepts. More often than not, the exercise can teach those concepts in several languages, although there might be case where it is not, which I don't mind personally.

There may be no prior art: a track may forego any or all of the prior parts of the series, so assumptions about what the student has been exposed to may be false.

This is a valid point. The exercise series approach that we are using for the resistor-color exercises is an experiment, to see if this type of approach is valid. Come to think of it, it might actually be a good idea to mention this series-like approach in the description of the canonical data. Note that is a point very specific to the resistor-color series. For the overall track, the track anatomy project (which at some point should be applied to each track) is designed to solve these "does the student have enough prior experience to successfully solve this exercise" questions.

It doesn't model TDD or good engineering practice for the students -- hands the responsibility and burden for explaining this off to the mentors instead of just modelling it up front.

I understand this point, but Exercism isn't about teach TDD or good engineering practice, but about teaching fluency in a language, which is something else entirely. That is not to say that we should teach bad engineering practices of course, but our primary goal should always be designed exercises such that they form the optimal learning experience for students; teaching them fluency. Regarding TDD, I have the same feeling. We should attempt to have the test suite be structured for the optimal learning experience, and doing proper TDD (if we could agree on that definition :)) is a distant secondary goal.

I hope this provides some insight into why my opinion is what it is.

coriolinus commented 5 years ago

In my opinion, true fluency in a language involves more than just mastery of its syntactic constructs; it necessarily encompasses some of the idioms and best practices which professionals employ in that language. We want to teach students not just how to solve problems, but to solve them well. Ideally, a student finishes a track with both an appreciation for elegant code, and some ability to produce it. If they don't have that, how can they really be called fluent?

ErikSchierboom commented 5 years ago

In my opinion, true fluency in a language involves more than just mastery of its syntactic constructs; it necessarily encompasses some of the idioms and best practices which professionals employ in that language.

I don't disagree, we just differ in the way we think that should be achieved.

petertseng commented 5 years ago

This comment boils down to "it depends on the context", but this is an unfortunate recommendation to make because that is going to lead into questions about what a given context is or how we should understand a given context. Unfortunately, this is the best I have to offer.

Given how strongly I advocated for https://github.com/exercism/problem-specifications/issues/192 (how strongly? I personally went and sent every track the PR!), you might have guessed that I advocate for testing the whole.

Given that I wrote https://github.com/exercism/problem-specifications/pull/1224 and in that file, I included more tests than just "do the two parties arrive at the same secret key after having performed key exchange"... I also test the derivation of a public key from a private key and that Alice can calculate a secret key using Alice's private key and Bob's public key. So you might have guessed that I advocate for testing the parts.

What happened?

Looking back at https://github.com/exercism/problem-specifications/issues/192:

This means that for my Haskell and Clojure submissions for this problem I was "forced" to write a helper function that my actual submission doesn't use to find the largest product.

The two intermediate functions (return all digits in the string, return all substrings consisting of N digits) were not used in solving the problem (finding the largest product made from substrings of length N), as described in https://github.com/exercism/problem-specifications/blob/master/exercises/largest-series-product/description.md. And since the whole exercise is based on a Project Euler problem, there is no context that makes it useful for the intermediate functions to exist. Therefore, to test them was to constrain the implementation unfairly.

With Diffie-Hellman, since the context is a well-known concept with a real-world application rather than an isolated Project Euler problem, we know that any reasonable piece of code that we would employ to perform Diffie-Hellman key exchange absolutely must support both of the intermediate operations, because:

You generate your public key from your private key and give it to the other party. You can do this regardless of whether you've received the other party's public key.
Once you have received their public key, you calculate the secret using their public key and your private key; you certainly don't have their private key! So the code must be able to calculate the secret using just the information you have.
Of course there will be an implicit test that the key exchange worked (are you successfully able to decrypt each other's messages). The test using information that is not fully available to either party is just making that explicit in the test frameworks we have. And besides, it offers an opportunity to marvel that the math actually works out.

In these two cases, a different decision was made, and the reason the decision is different is because of context.

A critical observer will actually realise that the author of this comment just found two different decisions and attempted to retcon them by simply making up a unifying principle. But this approach has a disadvantage in that it may cause mental gymnastics in order to actually make the principle apply; it assumed that the past decisions were correct, which is often a very unwise assumption to make.

An approach not susceptible to that disadvantage would have been to first decide on a principle based on values and beliefs held, then revisit past decisions with an eye for whether they need to be reversed according to the principle.

yawpitch commented 5 years ago

I don't agree. Yes this is probably how I'd like it to be implemented, but MUST states there is no other way. I don't even think SHOULD is correct. For example: in prolog I rather implement this with a matrix than an algorithm. Or a combination. Even if all bands could all have 9 colours, it would "only" be 531.441 entries. I can see solutions like that.

I can see solutions like that too ... anyone can write a look up table of all possible strings for all possible inputs. I doubt anyone would be obtuse or masochistic enough to do so, but more power to them. Anyone who generates such a look up table will, I believe, need to do all of those three steps, though I'm happy to see a working end-to-end solution that never uses a numerical representation or any form of mathematics.

In my view the canonical data does not describe a student's expected solution -- it describes the minimal expected components of the exercise's expectations of a public API, and the lowest common denominator representation of the signature and outputs of each component ... thus it describes the maintainer's work in terms of the folder structure, dummy files, test suite, and example solution -- including decisions about types used -- that needs to be implemented to make that exercise available for students. The tests which are a derived product of the canonical data describe the student's work, and any PR for them will exist in the Track's repo as a matter of course, not this one.

Other solutions might:

* transform 2 bands to a numerical value

* transform the third band to a multiplier

* transform into a total (number)

* transform into a string

Or if the language permits:

* Create an object that represents ohms

* transform the two bands into such object

* multiple it with the numerical value of the multiplier of the third band

* transform into a string

Those are both essentially restatements of my three steps... I never indicated that "ohms" was a specific breed of integer, or in fact were themselves numerical primitives. The raw, common denominator inputs as described in the canonical data should and probably must be transformed into an intermediate representation of a track appropriate form, which must then be transformed into the final representation. I don't care about the specifics of those work units, and I stand by the must until I see a sensible solution that, again, doesn't involve any mathematics (including mental ones) or numerical values in its creation. The initiating problem that we reasonably had with the exercise as originally described was that it forces a specific app interface on what several languages would, again reasonably want to implement as a library. As far as I'm concerned the solution of "30 kiloohms" as the final label was reasonably sufficient because it's also reasonably easy for a maintainer to parse into actionable data, but it's far from ideal. What hasn't been established, at all is why it is somehow worse to have the canonical data hold the actionable data as primitives that can be transformed by any maintainer into a useful (possibly string) target for the students to meet than it is to foist the un-actionable version on all.

But I think this is where the fundamental disconnect lies ... I cannot, at the moment, see why there's any argument to the canonical data saying, essentially, we're going to need to see 30 of something along side the unit "kiloohms", and then have each track decide what that label should actually look like in their tests. How does this generalizing principle not solve everyone's problems? Because we're back then to describing a single issue API and the track can merrily decide to make that an app or a library. If such arguments exist, please make them.

It's meant to move forward; so please don't assume the intention. If you assume it as hostility, then the battle is already lost because you'll likely to read everything in a hostile manner.

First "meant to move forward" if successful is, by definition, "closing off conversation". Please try to refrain from guessing how someone is, or is not, "likely to read everything". That sort of predictive mind-reading gets us nowhere. @coriolinus described an emotional response to "you don't have to implement in your track", I stated that I too have a similar initial response to that phrasing. Our drawing attention to those emotional responses does not open them up to critique, questions of motivation, or debate; it's presenting that said phrasing has become loaded and unhelpful. The constructive response is to find better phrasing, so let's do so, because we are most definitely not alone in finding that phrasing at the very least unnecessarily dismissive of what we see as valid concerns.

I simply meant that "if a dataset doesn't work with the mantra of the language, don't implement it".

Good, that is helpful. And we're saying the dataset could be useful to our camp if your camp were willing to normalise it in the canonical data and de-normalise it in your tracks. The other way around it is not useful to our camp, which will result in make work and duplication of effort for all.

As I stated, the exercise is meant to be placed somewhere on level 2 or 3 in the Track Anatomy project. I apologise for this not being more clear. I don't know if this can be solved in the short-term. I'm not the authority on that project, but I chat with Maud on a weekly basis which gives me insider-knowledge. However, once that's said, I don't quite understand why it's criticised or put under scrutiny.

I'm also working Track Anatomy and also chatting with Maud; appealing to any form of authority here doesn't get us anywhere unless and until it leads to an ultimately accepted proposal to encode the Track Anatomy-specific Level, Datatypes, Progression, etc into the canonical data schema ... which may very well be a sensible extension to make to that schema. As may be notation of "series" status, etc. But until that has happened and more than a minority of tracks have been through (probably several) rounds of Track Anatomy the positioning in terms of TA Level and Series are, essentially, moot to any language considering wether or not any given canonical exercise is worth the effort of implementing in that language. I have no idea if Rust is in -- or even interested in -- TA, but a Rust maintainer has as much right as any other to try and point out that canonical data appears to be being made unnecessarily exclusive by design. That critique / scrutiny is valuable.

Just because the intent doesn't translate to all tracks doesn't mean an exercise wasn't designed according to a specific set of intentions/goals/track anatomy placement.

It also doesn't mean that said designers of the canonical form understood the specific set of intentions/goals/track anatomy placement of the track implementation. All I'm saying is that the track implementation is where the rubber meets the road on intention ... the "intent" of grains is arbitrary until you use it in the context of a language.

Ah yes, this is also a pretty important one: I would like to see specialised exercises for currying, functions as first class citizens, declarative code, etc. These would most likely not even be possible to implement in all tracks!

No one ever said the canonical data was for all tracks; our point is only that it's maintained in a form that is accessible, at least, for all tracks within the same generally applicable paradigm. What is the point of canonical data for an exercise that can only be implement in Haskell? I can think of none. I'm fine with an exercise that materially applies only to declarative languages, or only to functional ones, but that sort of paradigmatic filter in no obvious way applies here. The entire reason we're arguing with you is because these resistor color exercises are potentially useful to a very wide range of tracks and yet there seems to be a foregone conclusion that the canonical data is the correct place to encode in a fixed notion of both intent and outcome. That conclusion is certainly a matter of the present debate.

To be fair: most of the comments as made by you and Peter in the other thread mostly work for your tracks and not necessarily of "ours". re: description that takes ohms as a parameter. I don't think this is a good argument in either of our camps. I think we all want things to work out better for "our" tracks and I think this is the game: find something that is the least resistance in the most number of tracks rocket

Those comments were made to point out that the canonical data form of this exercise is being pushed through in a form that limits its utility ... of course we used our tracks as examples. That in no way mitigates our point. We could make the same arguments from other tracks and that would still not mitigate our point. Neither I nor @coriolinus is arguing from the position of "we want this better for us" we're arguing from a point of "why are you making it worse for us for no obviously good reason?". We used examples to try and show how we'd tend to implement it in our tracks as an attempt to find the least resistance compromise.

As that compromise I'd prefer a single property in the canonical data that returned (number, units) because that's, IMHO, most translatable by a maintainer into the widest number of final representations. The split into "value" and "label/description" was only there to accommodate the insistence on a string as being necessary to the canonical version. An insistence I questioned several times and never got a well-reasoned answer for. Remove that necessity and we completely remove the need for a second property because the canonical exercise is now testing the sole piece of work intrinsic to learning the ohm value of a resistor.

As Erik and Maud described it, there is no value property. That's something you and Peter came up with in the discussion. That's of course perfectly fine, but in the original exercise, there was no such thing.

There is no "original exercise" ... there's a series of commits which, currently and after several rounds, would be an acceptable, if in my mind still very short-sighted, compromise. The first commits -- which included descriptive wording that would never change and which, let's face it, in no way whatsoever invoked or involved additional skill on the part of the student than the shorter label form, but which did include non-standard, English vernacular phrasing of SI units that would unnecessarily hamper an ESL student -- were flawed, in my mind fatally, which is precisely why they got push back. Once the label string becomes semi-actionable my objections to the single-property approach in that exercise largely fall away. But that, again, is my key point ... in my view the canonical data should describe the minimum public API required to generate actionable data for either a library or an app approach. If we privilege the canonical-as-app full, "test the whole" wins the day and resistor-color-trio has one parameter, but screws over everyone else. If we privilege the canonical-as-library fully, "test the units" wins the day and resistor-color-trio has three parameters, but screws over everyone else. If we privilege canonical-as-middle-ground then what is the problem with taking either approach at the track level?

I'm just explaining that what you're saying is still cutting the work up on an arbitrary boundary you feel comfortable with.

Sure. That arbitrary point is just meant to be more widely useful than your arbitrary point, and I still believe would achieve that goal, at least for those who fall more on my side. A compromise point would be better; the point of these debates is to find that compromise.

This mostly translates to normalisation of the canonical data (I think?). In the past, so there is precedence, we've always removed/rejected tests that tested specific implementation.

As I hope is clear from the above I am strongly arguing for normalisation of the canonical data, but again the properties we suggested do not test the implementation, they just expand the public API to maximize utility as a library. Any reference to testing implementation is a misapprehension, and honestly I wish we'd stop spending time on that tangent.

Properties are therefore units of public work. We try to have less properties because it keeps it more generic, but that's not a rule.

I'd say we try to have the minimal number of properties in the canonical data to make the exercise able to be implemented in a generic fashion. But again, from my perspective, the canonical data doesn't describe the student's solution; it's meant to be compiled into that description.

yawpitch commented 5 years ago

A critical observer will actually realise that the author of this comment just found two different decisions and attempted to retcon them by simply making up a unifying principle. But this approach has a disadvantage in that it may cause mental gymnastics in order to actually make the principle apply; it assumed that the past decisions were correct, which is often a very unwise assumption to make.

Well done for noticing and calling attention to that disadvantage.

An approach not susceptible to that disadvantage would have been to first decide on a principle based on values and beliefs held, then revisit past decisions with an eye for whether they need to be reversed according to the principle.

Sounds good ... now, how do we state the values and beliefs held by the obviously interested and passionate observers above in a concrete and actionable form?

To be upfront, my guiding belief is that the Description + Canonical Data form, essentially, the "source code" intended to be compiled onto as many Track "platforms" as reasonably possible and didactically useful. I do not believe its purpose is to directly enable the student, but instead to enable the maintainer to enable to student by giving them an exercise to do.

My values are a little harder to parse here: I value not wasting my time as a maintainer or a mentor, and I'll be honest I feel like I'd done quite a bit of that in the last 24 hours if the arguments I made still carry no water. I proudly value competency over fluency, generally, but in the specific meaning applied on this site that really just means that I want students to leave their core exercises with a higher degree of fluency than they currently are. This is why I am striving to improve the core track we have. It is also why I am arguing against the de-normalization of canonical data that privileges only the "app-first" and "test-the-whole" perspective, because that de-normalization makes it harder for me to deliver high fluency to students without wasting more of my time than I can afford.

I'd be happy to know what foundation others are standing on for this one in a less "back and forth" manner.

SaschaMann commented 5 years ago

I was writing a longer comment but @yawpitch described what I also wanted to say better and more detailed than I could, so I'll just emphasize one point that I find rather important and leave out the rest.

Your intent for a given exercise might very well not mesh at all with say Lisp or Bash or VimScript's take on the exact same exercise, and "just don't implement it" is a poor response because the exercise may very well have very valuable things to bring to that other perspective, but made difficult by "your" track's intent having been baked, hard, into the canonical data instead of into your track's test suite.

The entire reason we're arguing with you is because these resistor color exercises are potentially useful to a very wide range of tracks and yet there seems to be a foregone conclusion that the canonical data is the correct place to encode in a fixed notion of both intent and outcome.

I strongly agree with this.

I simply meant that "if a dataset doesn't work with the mantra of the language, don't implement it". Maud even suggested designing a version especially for you, where these concerns are separated! (resistor-color-four for example would be a prime candidate).

This argument can also be turned around. If an exercise is designed to fill a specific gap on one track, it shouldn't live in the shared problem-specs but only on the track that needs it.

To stick to the exercise in question, adding the tests for value allows other tracks to use the exact exercise -- with the same description, the same icon etc. -- in a way that works better on those, because the test generator (whether it's a script or a human) doesn't have to faff around with transforming the strings to something usable first or creating many test cases from scratch. Leaving out tests is easier than creating them from other tests or manually creating them.

As I read this issue, there seems to be one rather big disagreement whether or not the canonical data should be a base that track maintainers can adjust and build on vs. a "complete" specification of the tests that may or may not work across more than a handful of tracks. My observation might be wrong, though.

ErikSchierboom commented 5 years ago

At this point, I can honestly say that I'm in over my head here. There is just so much text that it is hard to get to the core of the problem any more I feel. Note that this is merely a statement of my feelings; I respect everybody taking the time to give their opinion. By now, I can almost imagine ourselves in two trenches, loudly yelling at each other from a safe distance, trying to convince each other. /cue comedy tune ...

Maybe what we need at this point is for have the core Exercism team to look at this and then make a decision on how we should view the canonical data? I have a feeling that the two different sides won't be able to sway the other side anytime soon.

@kytrinyx @iHiD (when he gets back from holiday): perhaps you would be willing to voice in with your opinions?

SaschaMann commented 5 years ago

By now, I can almost imagine ourselves in two trenches, loudly yelling at each other from a safe distance, trying to convince each other. /cue comedy tune ...

We'll just have to wait for Christmas peace then ;)

SleeplessByte commented 5 years ago

I'm glad @yawpitch and @coriolinus agree with eachother - it's kinda redundant to just keep repeating what eachother say. But once you start to imply that someone knows something better or that a way ís better this is no longer a debate but merely showing off who has the largest 🍆

And @yawpitch, apparently you really felt the need to go ahead an dissect the paragraphs I put down and idk teach me a lesson about how I should speak? You could have just said "that's not how I interpreted it the first time but now I understand" instead of saying that somehow I'm not constructive but you are?

F✴️ck this tone policing and good luck with this.

(Let's not forget that snide comment in the other thread you edited out).

yawpitch commented 5 years ago

@ErikSchierboom, thanks for this, and I am sorry for all the text, but believe me I see a real value in getting the debate down so that the core Exercism team can make a useful and measured ruling / compromise.

Let's start by noting that the canonical data are a means to an end, namely to help the students have the best possible experience using Exercism.

Honest question, but is that meaning defined anywhere? I personally would say that the canonical data is a means to an end -- namely to help maintainers implement exercises so that mentors can help students have the best possible experience using Exercism -- but I'm happy to admit I might be wrong there.

We are all agreed on fluency as the goal; the only obvious controversy there is over what constitutes fluency in any given language. In service of that ultimate goal, though, the canonical data seems to a have very important penultimate goal, which is to help the maintainers provide the fundamental units of teaching that fluency.

We should of course strive to make things easy for maintainers and test generators, but it should not be the main point of focus.

Certainly not the main focus of the whole project, but can it not be a primary focus of this repo? Because it seems to me that would end up delivering a better and more motivated experience to the students.

yawpitch commented 5 years ago

@ErikSchierboom, just because I just caught this particular line clearly.

By now, I can almost imagine ourselves in two trenches, loudly yelling at each other from a safe distance, trying to convince each other.

I, for one, haven't been yelling. I've been trying to parse and understand a point of view that I don't feel I can either entirely agree with, and which seems to run counter to values I became a mentor, and then a maintainer, to help spread. I'm hoping to find an acceptable middle ground or, failing that, some ability to see that the "feel" and "seems" in the prior sentence are illusory. I'm not much for hurling grenades.

ErikSchierboom commented 5 years ago

@yawpitch Thanks for the concise response, my brains can process that a lot better :)

Honest question, but is that meaning defined anywhere? I personally would say that the canonical data is a means to an end -- namely to help maintainers implement exercises so that mentors can help students have the best possible experience using Exercism -- but I'm happy to admit I might be wrong there.

Aha, I understand. That is definitely a valid approach, and I don't think the two views have to conflict, as your interpretation ultimately leads to the same goal, just through an intermediate step :)

I, for one, haven't been yelling

Oh sorry about that, I didn't actually mean yelling. It was just a (apparently failed) attempt at some humor. My mistake.

In service of that ultimate goal, though, the canonical data seems to a have very important penultimate goal, which is to help the maintainers provide the fundamental units of teaching that fluency.

This is true, although the end goal once again is to help teach fluency.

Certainly not the main focus of the whole project, but can it not be a primary focus of this repo? Because it seems to me that would end up delivering a better and more motivated experience to the students.

This I feel is a bit subjective, as this is hard to prove (same goes for my opinion by the way). Right? Not trying to be snide by the way, just checking if I'm missing anything.

yawpitch commented 5 years ago

Certainly not the main focus of the whole project, but can it not be a primary focus of this repo? Because it seems to me that would end up delivering a better and more motivated experience to the students.

This I feel is a bit subjective, as this is hard to prove (same goes for my opinion by the way). Right?

Yes, you're right, that is a bit of a subjectively loaded statement. Perhaps I should have said "could end up delivering", instead. I just want both sides to think about the possibility that seeing this repo as "maintainer-facing" and therefore crafting the canonical data to better serve the needs of delivering an exercise might lead to less conflict compared to seeing it as "student-facing" and therefore crafting the canonical data to better serve the needs of a student in delivering a solution. I think that would take the stress off of "library" / "app" and "test units" / "test whole" by letting those be track concerns.

Even just resolving, once and for all, what the fundamental product this repo supports truly is would be helpful. If in my mind that product is the maintainer's exercise and in your mind it's the student's solution, we end up at a sort of inherent loggerheads, even though the ultimate goal of both is to aid the student.

kytrinyx commented 5 years ago

@ErikSchierboom I will discuss it with Jeremy when he gets back.

ErikSchierboom commented 5 years ago

Great to hear @kytrinyx!

yawpitch commented 5 years ago

Note to all; the PR that initiated all of this, #1551 was just closed with what I'd call an excellent compromise resolution that allows both the "test the units" and the "test the whole" camps to continue along ... it's not ideal for either camp, but it strikes a good balance, and I hope sets a precedent for next time this (inevitably) comes up. What it doesn't do is resolve, really, the overall purpose of canonical data, in my mind there three key points still outstanding, the final added because of further thought on the matter:

Does it describe unit tests ("library") or integration tests ("app")?
Who is the primary consumer? Specifically, does the canonical data serve the maintainer in making an exercise or the student in making a solution?
What is the threshold for inclusion? How many tracks should an exercise obviously apply to before promotion of that exercise to canonical?

@kytrinyx @iHiD would love to have those three included in your discussion.

iHiD commented 5 years ago

Meh. I'm sad to come to this and see that good people have been having uncomfortable discussions due to a lack of direction from @kytrinyx and myself on this. I'm sorry that y'all ended up having to try and argue your way through something that we should have provided clearer guidance on, and that lead to tensions flaring. We'll be better at that in the future.

I've just opened #1560 that outlines some thoughts on this. I think the reality is that both "sides" in these debates are probably entirely correct in the substance of their arguments and that the core issue is that the current schema doesn't allow for both opinions to exist. I have some thoughts about having multiple "scaffolds" that different tracks can use to build their test-suites from (think one scaffold for "library-style" languages and one for "api-style" languages, and maybe another for something else), which I think will avoid many of the problems discussed. But there are other more fundamental questions (such as the ones that @yawpitch raises) that we need to answer via some product work and by taking learnings from the Track Anatomy work before we can continue.

My plan is that the product time will find some time over the next few weeks to really think this through and to say to y'all "This is what we want an Exercism exercise to be, these are its rules, and this is where maintainers are given freedom to follow the idioms of their languages (e.g. library vs api)". We can then discuss how we can best change this repo to be compatible with that.

I hope everyone can leave this debate not feeling too bruised. I love how much you all care about Exercism and about getting this stuff right, and I hope that everyone knows that despite the tensions that have arisen, we're all just volunteering our time to try and make this as awesome as can be. I'm really grateful for all of you.

sshine commented 5 years ago

I wanted to post some thoughts on #1560, but thought that it wasn't an appropriate issue in which to start a discussion. After this I realized this discussion exists and caused the lockdown, and I'm pretty happy that I didn't take part in it and pretty sad that grasping its entirety is overwhelming. I can say that only excellent people with excellent points have taken part in it, and it seems to have concluded with three points:

Discussion of the merits and flaws of two camps will not resolve the divide between them.
@yawpitch pointed out that #1551 ended in a compromise he hopes we can mimic another time.
@iHiD said he might come with a proposal to avoid conflicts in some weeks (1 month ago).

I've tried to TL;DR my points:

Tracks are run in federation, but canonical data is a shared resource without an obvious "product owner". This creates tension.
Perhaps a kind of cross-track "product ownership groups" of canonical data can resolve this tension. Examples could be "track anatomy group", "FP tracks", etc. How to best cut these groups is a meta discussion. We could even let the groups form themselves. Edit: This "product ownership" could be applied lazily as the need to resolve conflicts arises.
If disagreements arise around the canonical data of an exercise, we could use a tag to denote that we need conflict resolution. This means we'll need someone not opinionated to act as mediator (like the role of @petertseng) and find out which category of resolution we're ending up in: Can we find a compromise, like in #1551, can one side suck it up, can we split the exercise in two?
Maybe this process can be documented as a protocol, and maybe the likeliness that we need to use it can be reduced by embedding bias in exercises; e.g. I would not personally insist on anything in resistor-color-trio because I know it was made for the track anatomy project, and I'm not actively participating in it for any track.
While "product ownership groups" may relieve design of / contribution to individual exercises from "too many cooks", they cannot do so for meta discussions. Perhaps a conflict resolution protocol can do this, though.

bencoman commented 5 years ago

I didn't get involved in the discussion since I wasn't clear on my thoughts and was sitting on the fence between two considerations.

One is considering that as a mostly object-oriented programmer, when I come to try a functional-language track, I personally would "like to be led" through a more concrete FP implementation. It would be as much about adapting my style to "think functional" as it was about learning the language itself. And I see benefit in doing the same with people coming to an OO language like Smalltalk from a FP background.

The other is presuming that it would be hard/confusing to express both FP and OO styles in one specification.

My current thought is a way forward would be to have an api-style canonical-data.json which is mandatory for tracks to implement, and optional fp-data.json and oo-data.json files that help lead participants to a particular solution. Tracks could decide which optional data they wanted to implement. Further, if participants could choose whether they wanted to use tests from optional-data or go it alone, it may cater to a wider spectrum of abilities. I have noticed that even though Exercism originally seemed aimed at competent programmers learning a new language, Exercism also attracts those new to programming that need more support from mentors - which more fine grained TDD-implementation-data might help alleviate.

iHiD commented 5 years ago

Thanks both. I think the fundamental problem is that the purpose of tracks is not well enough defined. We're designing a large project focussed around fixing this, and properly designing tracks so their purposes and goals within Exercism is clearer. As part of that we'll be putting effort into working out things like this repo can work better. Until we finalise and announce that project, I'd ask for your patience for a little longer :)