Reevaluating cross-track consistency

exercism / discussions

For discussing things like future features, roadmap, priorities, and other things that are not directly action-oriented (yet).

37 stars 5 forks source link

Reevaluating cross-track consistency #158

Closed kytrinyx closed 7 years ago

kytrinyx commented 7 years ago

There have been a lot of discussions over the years about how to keep all the language tracks consistent with the problem specifications, and with each other. I have a sneaking suspicion that a choice that I made back in the summer of 2013 has caused us more work and more pain than we need.

I'm going to pitch a slightly different approach that I think would provide some significant improvements to the process for maintainers, and an improvement in the user experience for people who use Exercism.

What if the problem specification was a starting point, rather than a component of exercises?

In other words, if you're going to port the "clock" exercise to a new language track, you would

use the canonical-data.json as a starting point for a test suite
copy the description.md from the specification (x-common), make that the start of the README

You would then tweak the README however you wish in order to best fit the language track in question.

There would be no expectation that clock in one language would be equivalent to clock in another, nor would we necessarily expect that all languages would keep track of changes to an exercise and follow suit. Tracks could, of course, update a test suite, and probably will. We could even make it easier to know when/whether there are any interesting or important changes you might want to know about—but renaming a problem specification would not automatically impact any tracks at all, nor would adding tests, or changing the scope of a problem. Track maintainers could make updates to their tracks that reflect the change, or not.

The directory that contains the implementation would be exactly what is delivered to the user. Especially if we suggest that all tracks use a .meta directory as proposed by the Ruby track maintainers to keep the example solution and any other metadata that we might want to have there.

This would mean that an exercise that is custom to a language track is no different from any other exercise—you just implement it, README and all.

It would also mean that we can make sure that each README reads as well as possible, without having to think about how the various pieces are stitched together.

This would mean that we no longer generate READMEs based on content that lives in five different places. We can get rid of HINTS and EXERCISE_README_INSERT.

This would mean duplication of some data—to some extent—but I think that in this case the avoidance of duplication was a case of premature optimization. I thought these things were the same, but really, they're not. They just look similar, and so I was fooled (as I often am in code, when I refactor too hard and end up with code that is not dry, but chafe).

Related issues that we should take into account in this:

https://github.com/exercism/discussions/issues/136

@IanWhitney @petertseng @rbasso @ErikSchierboom @Insti @kotp @parkerl @iHiD @behrtam @stkent @NobbZ - you have all been pretty heavily involved in various parts of evolving both the specifications and exercises. What do you think about this?

@exercism/track-maintainers this affects all of you. Thoughts?

catb0t commented 7 years ago

Exercism currently acts (to me, at least) as much a tool for learning a new language from scratch, as a programming chrestomathy site.

Like a Rosetta stone, on Exercism one can rely on consistent problem specs between language tracks, even though this is apparently an unintentional and emergent property rather than a design decision.

I also don't think inconsistency across tracks would be bad, or that not being a rosetta stone would negatively affect Exercism's value as a learning / hacking tool.

I don't have that much involvement or clout with this high-level side of the project but that's just my impression and I don't think I'm alone in it, either.

catb0t commented 7 years ago

Now that I've briefly taken off my project-contributor-hat to make a more nebulous and general comment, let me put it back on to ask:

Isn't this going to result in even more duplication of effort across the tracks and the Exercism project as a whole (including the non-track-specific repos and contributors like x-common) instead of minimising that?

A problem the FLOSS community tends to have (though often it's more between projects than internally) is quite a lot of duplication of effort and Not invented here syndrome and though on its own it never leads to the fall of whole projects or communities, it can waste the free time of people who could spend their energy better collaborating on things that haven't been done yet somewhere else in the project.

To me the obvious solution is to consolidate that effort in some sort of shared repository of exercise data that everyone can hack on and reap benefits from, uh, perhaps we could call it x-common?

Now, seriously, it seems like x-common even now has a lot of duplicated work as a result of its structure. In my two-cent opinion, it might be worth more to improve that consolidation of effort, than to blow it into a million little fragmented pieces.

stevejb71 commented 7 years ago

As a user, I've implemented exercises in multiple tracks and not really cared about consistency between them. Where I've seen inconsistencies between readme and tests I've ignored the readme (and sometimes raised an issue).

I also don't care much about versioning: if the exercise changes and my solution is outdated, I've usually moved on by then anyway.

That's just one sample of course ☺️

As a track maintainer though, the canonical data does provide a useful repository of test cases. I've largely kept my track in sync but allowed differences where I thought it made sense.

petertseng commented 7 years ago

but renaming a problem specification would not automatically impact any tracks at all

The below is not exactly what you are talking about, but related. As someone who is interested in seeing the various ways to solve a problem (and has submitted solutions to some problems in multiple languages in order to further this goal), it is useful to know that exercise E in some language L1 corresponds to one with the same name in language L2. I don't need them to be exactly the same, but as long as they invite comparable solutions, that is good.

This doesn't necessarily mean that the above criterion ("invite comparable solutions") will be made false if tracks were free to name their exercises whatever they pleased. Just pointing out a slightly higher risk.

nor would we necessarily expect that all languages would keep track of changes to an exercise and follow suit

Seems like this trivially solves https://github.com/exercism/discussions/issues/2, then. One might even argue it solves https://github.com/exercism/x-common/issues/524 but...

We could even make it easier to know when/whether there are any interesting or important changes you might want to know about

I infer from this that https://github.com/exercism/x-common/issues/524 would still be a concern.

This would mean duplication of some data—to some extent—but I think that in this case the avoidance of duplication was a case of premature optimization. I thought these things were the same, but really, they're not.

Could I get a clarification on the things being referred by "these things"? I wasn't sure.

If you're going to port the "clock" exercise to a new language track, you would [...] copy the description.md from the specification (x-common), make that the start of the README You would then tweak the README however you wish in order to best fit the language track in question.

I hope that most of the time, the tweak required is no tweak at all, if the problem specification is supposed to be language-independent.
- If we could hear of an example of a track that really needed to tweak, that would help motivate.
I imagine performing a copy will make it difficult to determine whether my track's README is up to date with any changes that have been made to description.md since I performed the copy. This could get really tricky with tracks with dozens of exercises. How to deal with this?

This would mean that an exercise that is custom to a language track is no different from any other exercise—you just implement it, README and all.

I wonder whether this will cause contributors to tend to implement exercises as single-track exercises when instead they could have gone in x-common. If that does happen, it seems that the tracks would then share less and learn less from each other.

This would mean that we no longer generate READMEs based on content that lives in five different places. We can get rid of HINTS and EXERCISE_README_INSERT.

If I assume that every track does indeed have some content that should go in every exercise's README, it seems a shame to have to duplicate the contents of EXERCISE_README_INSERT into every single README (consider the tracks that have dozens of exercises). And if that content has to change, now we have to change it in many places instead of just one. I think that would make my life worse.

It would be interesting if there were something like a README template that can be used to generate the README.

{{include x-common/exercises/clock/description.md}}

{{include xmytrack/docs/EXERCISE_README_INSERT.md}}

Anything that was formerly in HINTS.md would instead be written inline inside the template.

And I suppose any track that doesn't want to use the default description can just... not include it, and write their own.

But then we have to have a templating engine in trackler or x-api, so this solution leaves something to be desired.

kytrinyx commented 7 years ago

To me the obvious solution is to consolidate that effort in some sort of shared repository of exercise data that everyone can hack on and reap benefits from

I am still advocating for consolidating efforts and keeping problem specifications. I'm suggesting, however, that these act as a starting point that you copy/clone, and that from there you diverge (or not).

I think that it's absolutely a good idea to have canonical data and language-independent descriptions. This makes it much easier to make exercises for a track.

I think it's less of a good idea to have the README of an exercise in a track change independently of the maintainers' explicit actions.

I hope that most of the time, the tweak required is no tweak at all, if the problem specification is supposed to be language-independent.

Yes, I think this is true.

I imagine performing a copy will make it difficult to determine whether my track's README is up to date with any changes that have been made to description.md since I performed the copy.

True. I don't know whether this is a problem, though. Or, if it is, then perhaps a track might template their exercise READMEs and regenerate them using a script if that suits them.

it seems a shame to have to duplicate the contents of EXERCISE_README_INSERT into every single README

Yes, that's a good point.

There are several things that I think would be an improvement.

Not have to generate the README on the fly, and
Not have to generate the README based on content in multiple repositories (right now, 3 repositories, 6 files).
Loosen the coupling between the problem specification (x-common) and the implementations of those problems.
The ability to hand-craft a README if it makes sense to do so (early on in the track it might make more sense to pay particular attention to hand-holding / guidance / instructions).

I thought these things were the same, but really, they're not. Could I get a clarification on the things being referred by "these things"?

Yeah, sorry. I thought that the description for each exercise would always be the same, but they do not always represent the correct instructions for an exercise (as we have discovered when trying to define the process for keeping exercises in sync with the READMEs). We discover a change we'd like to make, and update the README, and then a bunch of tracks now have inconsistencies between the description and the test file. Usually it's not a big deal; sometimes it's very confusing.

I saw a talk recently by someone from Khan Academy who was dealing with their localization efforts. For a long time they had a problem that is exactly the same shape as our problem, except more extreme. The core strings that the translations were based on, were the English version of a course. They wouldn't publish, say, the Korean version until the translation was complete. Then, inevitably, someone would tweak the English version, which meant that there were untranslated strings for the Korean version. The site would then display all the translated Korean... with the untranslated phrases in English.

Their solution (I'm simplifying) was to pin the translation to a given version of the core strings. Then if they're out of date, they translate any missing bits, and repin to a newer version.

I don't think that we need to go to the lengths of pinning the generated README to some older SHA1 in x-common or something. But I do think that we should not generate on the fly.

NobbZ commented 7 years ago

“Fork” and diverge is an interesting concept for specifying the exercises.I do think it could work, but I really think we should always communicate some "baselines" which should remain consistent throughout the tracks.

I'm remembering back the time when bob was in the first couple of exercises in nearly every track. It annoyed me to hell, that nearly every track had a different whitespace policy (Is "Foo? " a question or isn't it), also priority of checks was different, on some tracks "FOO?"was treated as shout, on others as question.

It's my fear, that polyglot students who will see exercises multiple times in multiple languages will be confused as they were a couple of years ago, before we introduced common data. So the exact process should be defined carefully to provide maximum flexibility while minimizing discomfort of students.

rbasso commented 7 years ago

@kytrinyx wrote:

There are several things that I think would be an improvement.

Not have to generate the README on the fly, and

Not have to generate the README based on content in multiple repositories (right now, 3 repositories, 6 files).

Loosen the coupling between the problem specification (x-common) and the implementations of those problems.

The ability to hand-craft a README if it makes sense to do so (early on in the track it might make more sense to pay particular attention to hand-holding / guidance / instructions).

Seems reasonable.

The core problem

I think that, while trying to factor regularities among languages, Exercism may have went too far. The shared description.md is just a good example.

We did spend a lot of time trying to write good, language-independent exercise specifications - and it seems that we kind of succeeded - but we also lost some flexibility in the way.

The canonical-data.json is also a little problematic, because each language has a more idiomatic way of representing its data structures in JSON. Sometimes it feels impossible to be language-agnostic, and strict compliance may result in exercises that feel unidiomatic in some languages.

Also, some tests that make sense in some languages may be absurd in others:

Weakly-typed languages need to test conditions that cannot even be expressed in more strongly-typed ones.
Property-based, automated tests are impossible in most languages, but are standard in others.

I think our current structure is overgeneralized, and we should probably take a step back to get flexibility and modularity.

What x-common should be?

@kytrinyx

What if the problem specification was a starting point, rather than a component of exercises?

Makes sense.

Trying to fit all tracks in the same structure, we removed things that only make sense in some languages, and we also avoided adding others. This is great for a shared pool of exercise specifications, but each language lost something in the process of converging to x-common, so it makes sense to give more freedom to the tracks.

Separation of concerns

@kytrinyx

The directory that contains the implementation would be exactly what is delivered to the user.

This will probably create an additional step in each track, possibly making maintenance inconvenient, but it seems a good design to separate the generation of static content from the track's packing. I'm just not sure about how inconvenient this will be.

@petertseng wrote:

It would be interesting if there were something like a README template that can be used to generate the README.

That would be great, but I'm just not sure if this should be a general solution or if it would be better to do that in a previous step in each track. Seems very similar to the problem of hosting in GitHub the sources and the site built with static site generators, like Hakyll and Jekyll, but I don't know much about what would be the recommended practice.

stkent commented 7 years ago

This seems like a reasonable approach. Keeping versioned canonical data makes sense as that's where we clarify how ambiguous cases should be handled, and where we can add new test cases that cover previously-uncovered cases. Both those uses have greatly benefitted the xjava exercises. It might be nice to add versioning to the canonical READMEs too, just so tracks can easily detect upstream changes and pull any improvements into their local copies.

Regarding what @rbasso pointed out regarding canonical data being too rigid: I was recently toying with the idea of writing generators for xjava, and I was thinking about how to make generated implementations idiomatic (e.g. how to split inputs to a problem between constructors and method parameters). I'd decided that the only way to leverage canonical data while producing close-to-idiomatic tests would be to maintain a per-track per-exercise set of metadata that would exist to aid interpretation of the canonical data...

masters3d commented 7 years ago

I like pinning an exercise implementation to an specific commit or version. Reminds me of the way Yarn works for JS packagers. It is kind of weird that all tracks' implementation get essentially invalidated every time there is a change to the description. I am not sure what kind of tooling would be needed to make this happen but I love the idea of each track being a complete snapshot that is able to run even after updates have been done to the "master" description. Maybe we can create a JS package that contains all the documentation for an specific exercise then each implementation can specify which version of the documentation package they are going to use and only update it when they are ready to move versions. Perhaps all these exercises can then just live inside one repo as git submodules. The advantage of using a package manager is that there is a way to update the documentation that is not manual but we can pin easily.

petertseng commented 7 years ago

I think it's less of a good idea to have the README of an exercise in a track change independently of the maintainers' explicit actions.

It will be a bit of a shame that explicit action will be required. It is currently the case that we can follow x-common updates for free. If the problem became a starting point instead, then if we make an x-common change that we find desirable to propagate to all tracks, every single implementing track will need to take action.

(This assumes that such desirable changes are somewhat frequent. Are they anticipated to be? Or shall we expect arbitrary divergence?)

For those maintainers who would have kept their tracks up to date regardless, the free updates streamline their processes without causing undue confusion for students (out-of-date READMEs). Losing free updates in this way will cause increased effort to stay up to date with x-common changes but fail to receive the benefit of reducing confusion. I acknowledge that this bias is present in myself.

Or, if it is, then perhaps a track might template their exercise READMEs and regenerate them using a script if that suits them.

Ah yes. I was doubtful of my original proposal of including a template engine into trackler since that still implies generating READMEs on the fly.

You've convinced me that there is a way forward that doesn't involve that: Each track may, if desired, have a template and generator run by the maintainers. It generates READMEs that trackler just serves as-is. That would be just like how test generators work.

However, in contrast to test generators which every track will probably write in a different way, it's very possible many tracks will do README generation the same way. Maybe tracks will wish to share code for that. Would be a shame to duplicate efforts, anyway.

It doesn't save us from complexity (it will be in the template and generator), but it will at least make it almost effortless to keep up to date, since it will just be running the generator on everything. Sometimes it's necessary to take on complexity in one place to save from it in another (in this case, the complexity of generating a README in trackler rather than ahead-of-time).

To me it seems this proposal would mean a lot more work is needed to keep READMEs up to date, and I personally would not derive much benefit from it. I explicitly had to say "personally" (as opposed to it being a general principle) because I already know how the READMEs are currently generated and I don't really have any particular changes in mind that I would make to the READMEs, and neither of those are general principles. So I suppose I should defer to those who do benefit from it!

kytrinyx commented 7 years ago

Ah yes. I was doubtful of my original proposal of including a template engine into trackler since that still implies generating READMEs on the fly.

I think we could have both.

A shared tool (that all tracks may use, but don't have to), and a template in .meta/ that follows a shared convention. Then maintainers could periodically run the tool, which would show them the changes, so they can evaluate them and decide if they want them.

ErikSchierboom commented 7 years ago

I think the benefits of having self-contained exercises are greater than the disadvantages, so I'm in favor.

Having some sort of shared tool that allows maintainers to quickly check for changes would be great.

kytrinyx commented 7 years ago

To summarize the arguments above, the biggest concerns are about duplication of effort, and the amount of effort required to stay consistent with the shared problem specifications. This is reflected both in the question how will we know if something needs to be updated? as well as the work involved in making the updates, which currently happen seamlessly and automatically.

There is a smaller concern about whether or not exercises should provide a sort of Rosetta stone, providing a point of comparison across languages. The consensus here seems to be that this is less important, with one exception:

It annoyed me to hell, that nearly every track had a different whitespace policy [for the bob exercise]

I think that when there are particularly annoying inconsistencies, we should open a discussion and suggest a consistent approach.

The problem specifications are a useful repository of exercises, and the canonical data provides a useful repository of test cases. The goal of Exercism is to provide an introduction to the programming language, consistent with its norms and idioms. Some languages will not find these problem specifications and test cases useful, others will.

There is a concern about whether or not exercises will be implemented directly into the tracks and not submitted as seed problems to the common pool of specifications.

This is an important concern, I think, and we should create tooling that makes it easy to see what exercises exist only in the tracks and not in the common pool, as this will likely give us opportunities to grow the common pool.

I'd like to suggest that in order to make the move to self-contained exercises, we create common tooling that all tracks may use, with a simple templating system so that we can have a template (within .meta), which defines both custom text and inclusions from other places. The tool would then regenerate the README, which could be eyeballed by the maintainers before committing.

I think this is the best balance possible between self-contained exercises, control over the content, and minimizing the amount of effort needed to update the READMEs.

I will open a separate discussion for defining the spec for this.

kytrinyx commented 7 years ago

I've opened the issue about the file format here: https://github.com/exercism/discussions/issues/163