Closed kytrinyx closed 7 years ago
I think it's a great idea. I think it would be worthwhile to think about a way to make it non-optional though. If you don't, you run the risk of making it like comments in code. If you expose it to users you have to get it right; you could even do this like saying your skill level for a certain thing is at a certain level. Of course, the flipside there is that you then have to handle perception of users.
Conceptually I think it's an interesting question as well whether you should actually enforce a path, or let the user choose to follow a path of their own, with possible other routes (like for instance DuoLingo does). I think the first few exercises you want to let people adhere to a specific order, but after that, it matters less I think.
But all this, if at all interesting, does not need to be the first version of course :) (also: disclaimer, I haven't been on exercism for quite a long time)
I for one think this is a great idea! As a track maintainer, I often struggle with the relative ordering of exercises. Having a consistent way to classify and categorize exercises would greatly help with that. I would suggest that for each exercise, we add the following information:
As @choiaa suggested in this issue, we might even be able to automatically update the order of the exercises based on the topics and difficulty.
In the discussion started by @pminten, he suggests to have two also categorize the exercises in either:
Although I see what he's trying to achieve, I think the same can be accomplished using the suggested difficulty classification.
He also suggested that we specify the topics in the exercise data in the x-common repository, but I think that would not be very convenient and the same exercises might have completely different topics in different languages.
I am also favor of the suggestion of @markijbema that this information should be required in the new format. His point about having a default path of being able to choose a path is also a good one, but I think that's a different discussion.
Overall, I think it's a great idea and I'll gladly help.
I think it's a great idea, too, but I don't yet know how we'd solve the issue with different exercises being different levels of difficulty in different languages.
For example, the anagram
exercise is on the easier side in Elixir or Ruby, but apparently in Rust it's very difficult. At the very least, maybe in the config.json
when we list the exercises, we could also list a "relative difficulty" for that exercise as it's implemented in a given language? That way when new exercises are added then the person adding that exercise will have at least a general idea about where to slide it in. Maybe something like this?:
"problems": [
{
"_difficulty": 1,
"name": "hello-world"
},
{
"_difficulty": 3,
"name": "anagram"
},
{
"_difficulty": 10,
"name": "forth"
}
]
Maybe to get better measures of difficulty we could get input from users after they submit an iteration on how hard they felt the problem was in a given language, and that data could be exposed somehow for maintainers to use? That way they'd have some more concrete information on which to base the ordering of exercises.
I don't have time to dive deep but I am envisioning a solution that is similar to operator precedence. Perhaps we can group each problem and then just worry about the grouping or "concept" ordering and not specifically about each problem order.
I think it's a great idea, too, but I don't yet know how we'd solve the issue with different exercises being different levels of difficulty in different languages.
With the suggested approach, each language track's classification would be independent of other languages. So anagram
in Ruby could be classified as easy, and in Rust it could be difficult.
And I think that is the only way it will ever make sense. I'm really excited about this possible change!
Is there a list of (generic) programming concepts that could be used and weighted? The list being shared (mostly common) between tracks, but the weights given to the concept specific to those tracks?
@kotp I don't know if there is such a list, but we could also try to gather it manually. Here is a quick attempt, in which I have tried to group related concepts:
Basic concepts
Data types
Problem areas
This is by no means a definitive list, it's just something I compiled myself. Maybe we can use this as a starting point for our discussion?
Definitely like the list. I had started one yesterday, but did not get far.
The general programming concepts are never so general as they appear. :smile:
I think It's nice to have a list of suggested categories, unless the tracks are forced to fit in them.
Maybe it would be more interesting to leave the topics completely open and see what would emerge from the tracks...
I think the topics will need to differ per track as well, at least, you don't want to restrict them for all languages.
@rbasso I don't think we should leave the topics completely open, that will probably lead to a lot of unwanted duplication where similar concepts are named differently in different tracks. We would then have to do cleanup later.
@markijbema The topics will definitely differ per track, but as I said, this list can be something of a starting list. It is not meant to be comprehensive, just helpful :)
You convinced me, @ErikSchierboom. :smile:
I don't think we should leave the topics completely open, that will probably lead to a lot of unwanted duplication where similar concepts are named differently in different tracks. We would then have to do cleanup later.
What scale should we use to grade the difficulty? 1 to 10? 1 to 3? Or text: easy, average, hard?
I think it is easier to consider difficulties in grades of 10 or even 0 to 9. Something that is built in and primitive in the language may be 0 or 1, depending on the conceptual difficulty. It also makes it simple to add the grades to make a weight for for an individual exercise based on adding those values.
The report could be easy, average or hard, derived from the numbers. There could be an additional weight that doesn't correspond from the categories directly, but from the reported response from the users that we get in feedback, that can help to fine tune it to where it is perceived to belong as well, over time.
@kytrinyx What is the next step? I think we should decide upon a new format in which the exercises can be described in the config.json
file. Maybe something like this:
"problems": [
{
"name": "hello-world" ,
"difficulty": 1,
"topics": [
"control-flow (if-statements)",
"optional values",
"text formatting"
]
},
{
"difficulty": 3,
"name": "anagram",
"topics": [
"strings",
"filtering"
]
},
{
"difficulty": 10,
"name": "forth",
"topics": [
"parsing",
"transforming",
"stacks"
]
}
]
Yepp, I think that's the next step.
I'd like to use a new JSON key so that we can leave the old format in place during the migration period (we don't want to leave it in place forever, but for a few weeks to give people time to make the change).
I'm thinking "exercises"
makes sense.
Also, how about slug
instead of name
? We make the distinction in the problems API between the slug (identifier) of the exercise, versus the name, which is a englishified version of the slug (separate words, and capitalized first letter of each part of the name).
I like the difficulty 1-10, which means that we can fine-tune things over time.
Let's change it to slug then. I'll start working on mapping topics to exercises for the F# track.
This all turned out great! I'll work a bit on handling this for Elixir, too.
Excellent! We'll need to write up something that can be submitted to all the tracks. If nobody writes up a suggestion here, I'll tackle that this weekend.
Here's the issue text that I intend to submit to all the tracks. Would you please review this for clarity and correctness?
Subject: Update config.json to match new specification
For the past three years, the ordering of exercises has been done based
on gut feelings and wild guesses. As a result, the progression of the
exercises has been somewhat haphazard.
In the past few months maintainers of several tracks have invested a
great deal of time in analyzing what concepts various exercises require,
and then reordering the tracks as a result of that analysis.
It would be useful to bake this data into the track configuration so
that we can adjust it over time as we learn more about each exercise.
To this end, we've decided to add a new key _exercises_ in the
config.json file, and deprecate the _problems_ key.
See exercism/discussions#60 for details about this decision.
Note that we will **not** be removing the _problems_ key at this time,
as this would break the website and a number of tools.
The process for deprecating the old _problems_ array will be:
* Update all of the track configs to contain the new _exercises_ key,
with whatever data we have.
* Simultaneously change the website and tools to support both formats.
* Once all of the tracks have added the _exercises_ key, remove support
for the old key in the site and tools.
* Remove the old key from all of the track configs.
In the new format, each exercise is a JSON object with three properties:
* _slug_: the identifier of the exercise
* _difficulty_: a number from 1 to 10 where 1 is the easiest and 10 is
the most difficult
* _topics_: an array of strings describing topics relevant to the exercise. We maintain
a list of common topics at https://github.com/exercism/x-common/blob/master/TOPICS.txt. Do not feel like you need to restrict yourself to this list;
it's only there so that we don't end up with 20 variations on the same topic. Each
language is different, and there will likely be topics specific to each language that will
not make it onto the list.
The _difficulty_ rating can be a very rough estimate.
The _topics_ array can be empty if this analysis has not yet been done.
Example:
"exercises": [
{
"slug": "hello-world" ,
"difficulty": 1,
"topics": [
"control-flow (if-statements)",
"optional values",
"text formatting"
]
},
{
"difficulty": 3,
"slug": "anagram",
"topics": [
"strings",
"filtering"
]
},
{
"difficulty": 10,
"slug": "forth",
"topics": [
"parsing",
"transforming",
"stacks"
]
}
]
It may be worth making the change in several passes:
1. Add the _exercises_ key with the array of objects, where _difficulty_
is 1 and _topics_ is empty.
2. Update the difficulty settings to reflect a more accurate guess.
3. Add topics (perhaps one-by-one, in separate pull requests, in order
to have useful discussions about each exercise).
Note: Edited for readability inline here, line lengths. - KOTP
Excellent write-up. 👍
I'm just slightly confused by topics simply being "an array of strings that describe topics that the exercise covers". Shouldn't this rather be a reference to centrally maintained topics (in order to avoid calling the same thing different names?). The reference of course can still be textual, but it would be nice to have it checked ;-)
@chezwicker makes a good point. Maybe we should put the list of topics I gathered in one of the previous posts into a separate file in the x-common
repository, named topics.json
or something like that? Then we could also iteratively improve the list of topics through discussions and PR's.
@chezwicker: I think topics will vary by language. Rust, for example, will have Lifetimes and other languages won't. Or several languages will have variations of Maybe/Some but under different names.
I think centralization of these terms might be a premature abstraction.
@IanWhitney you're of course right that some topics will vary, but I believe more topics will be similar across languages. And using different words for the same concept has a lot of potential of leading to confusion. I would assume many people with knowledge of one language would be using the platform to learn others - for those, I think it might be helpful to recognize concepts.
Of course you're also right that some languages will call the same concept different names, so maybe aliases would be helpful. Perhaps a central ´´´topics.js´´´ and optionally one per track "renaming" concepts by defining aliases?
Or maybe that's just overengineering now. I'm merely pointing out that it could be nice having some consistency across tracks.
@IanWhitney @chezwicker Yes topics will vary by language, so IMHO you should feel free to replace Option
to Maybe
when that applies to your language track. There will also be subjects that are exclusive to a language (e.g. Active Patterns in F#), which should thus not existing in the "master" list of topics. However, many topics will also be the same. I think it thus makes sense to use one "default" set of topics to choose from, which can then be modified or added to in the specific language tracks.
It sounds like we're aiming for:
In the suggested text, I wrote:
* _topics_: an array of strings that describe topics that the exercise
covers
How about changing this to the following?
* _topics_: an array of strings describing topics relevant to the exercise. We maintain
a list of common topics at $URL. Do not feel like you need to restrict yourself to this list;
it's only there so that we don't end up with 20 variations on the same topic. Each language
is different, and there will likely be topics specific to each language that will not make it
onto the list.
Works for me!
OK, I've updated the text.
I'll also add an empty topics list in x-common. Shall we make it plain text, with one topic per line? It feels like json is a bit overkill since this is just going to be for human consumption.
@kytrinyx Ah yes, that's far easier :)
Sounds good!
Would someone mind taking a look at the suggested starter file? https://github.com/exercism/x-common/pull/337
I used @ErikSchierboom's list farther up in this thread.
I'm sure it can be improved upon, but as a starting point, it's looks fine I think.
Yeah, that's what I was thinking. My first thought was to make it empty, but then I remembered that you'd started a list.
@kytrinyx One small question: how would the order in the new situation work? Is it still the order in which the exercises are listed, or is that list first sorted by difficulty? E.g., consider the following data:
"exercises": [
{
"slug": "hello-world" ,
"difficulty": 1,
"topics": [
"control-flow (if-statements)",
"optional values",
"text formatting"
]
},
{
"difficulty": 3,
"slug": "anagram",
"topics": [
"strings",
"filtering"
]
},
{
"difficulty": 1,
"slug": "binary",
"topics": [
"parsing",
"transforming"
]
}
]
Is the exercise order either:
hello-world
anagram
binary
or
hello-world
binary
anagram
I would opt for the second choice.
I am definitively against an implicit ordering by difficulty. There might be some educational intention behind handing out the more difficult task before the easier one.
@NobbZ good point. So we would just use the order in which the exercises are listed as the order of them being fetched?
In @kytrinyx' PR, the topics are copied as-is from my list. Those topics have "normal" casing, such as "Optional values". Should we leave it as it is or use lower-casing?
So we would just use the order in which the exercises are listed as the order of them being fetched?
I think so.
Should we leave it as it is or use lower-casing?
I think normal casing is fine, unless it's easier to be consistent with lower case.
Let's keep it normal casing then. That way, we could also display the information on the website if needed.
I hit rate limits. Investigating.
https://developer.github.com/v3#abuse-rate-limits
Apparently I need to make the script wait for a bit between each call.
We could have a new key,
exercises
, which contained an array of objects with the problem slug and topics. x-api could be changed to look preferentially at the new key, and fall back to the old one if it's missing.Simultaneously change the website and tools to support both formats.
Has this been done yet? And if not, I think it would be wise to say here when it has been done. (I looked at x-api and I didn't see it, but I may have missed)
And why do I care? Simply because I want to know whether "Add the exercises key with the array of objects, where difficulty is 1 and topics is empty." can be done simultaneously with "Remove the problems key". Otherwise there will be two problem orderings (one defined by problems
, one defined by exercises
) and any changes to problem ordering would have to be done in both places.
Simultaneously change the website and tools to support both formats. Has this been done yet?
No not yet. I've opened this issue: https://github.com/exercism/x-api/issues/134
I want to know whether "Add the exercises key with the array of objects, where difficulty is 1 and topics is empty."
Yeah, that's a good point.
Also, we will need to update configlet
: https://github.com/exercism/configlet/issues/7
Is there / should there be any affect on 'foregone', or 'deprecated' sections of the config file? Will those remain just a list of exercises?
I want to know whether "Add the exercises key with the array of objects, where difficulty is 1 and topics is empty." can be done simultaneously with "Remove the problems key". Otherwise there will be two problem orderings (one defined by
problems
, one defined byexercises
) and any changes to problem ordering would have to be done in both places.
That's done by https://github.com/exercism/x-api/pull/137, I am glad that we can now deduplicate.
Is there / should there be any affect on 'foregone', or 'deprecated' sections of the config file?
@verdammelt no, I think it's legit to be able to deprecate / forego individual exercises on a track level.
Quick question: is the difficulty
value required to be integer, or can we use decimals e.g. 4.7
?
[For context, the Java track had several maintainers chime in with difficulty estimates, then took the average to compute an overall difficulty curve. This naturally led to some non-integer scores. Rounding is no problem if necessary, but it feels silly to throw away the "extra fidelity" if we don't have to!]
Ah, well, I could give the pragmatic answer of "nothing on the website or API shows the difficulty value. Therefore, it can be whatever you want". The Rust track got lazy and we thus only use 1, 4, 7, and 10.
But I guess that's no substitute for the real answer, which will be depend on the eventual intended use.
@petertseng ah, good to know!
From manual inspection of the tracks that have both switched structure and assigned non-trivial difficulties (i.e. not all 1
s throughout), integer-only appears to be preferred and would seem to be the safer bet at this time, so I'll go with that!
Examples:
For the past three years, the ordering of exercises has been done based on gut feelings and wild guesses.
Over time this has proven to work OK-ish, but it's not great. There are easy exercises that get placed too far back, exercises that are too similar to one another, and exercises that are too difficult that end up early in the track, and we mostly don't notice until someone says out loud that they're struggling with something.
@IanWhitney did a thorough analysis of the rust track in https://github.com/exercism/xrust/issues/127, which resulted in reordering the exercises. He wrote about the experience in an essay called Exercism Shouldn't Make You Cry
We're also talking about similar issues in F# (https://github.com/exercism/xfsharp/issues/133 - @ErikSchierboom) and Elixir (https://github.com/exercism/xelixir/issues/190 - @devonestes), and I'm sure the topic has been mentioned elsewhere and I've missed it.
Back in the day, Peter Minten talked about how we might classify exercises more systematically, in https://github.com/exercism/x-common/issues/63 and https://github.com/exercism/x-common/issues/72.
I think that if we have language-specific classifications and topics, we should do it in the language-specific repository (keeping all the language-specific stuff together).
What if we did this in
config.json
? We could have a new key,exercises
, which contained an array of objects with the problem slug and topics. x-api could be changed to look preferentially at the new key, and fall back to the old one if it's missing. Then we could migrate all the tracks without having to do everything all at once.The topics would be optional, but having it in the actual codebase would probably help crowdsource this data.
@exercism/track-maintainers Thoughts?