exercism / problem-specifications

Shared metadata for exercism exercises.
MIT License
326 stars 541 forks source link

Pythagorean Triplets and other exercises require additional knowledge #1902

Open mohmad-null opened 2 years ago

mohmad-null commented 2 years ago

There are a considerable number of exercises that appear to require some degree of domain knowledge / expertise to solve, and don't do anything to fill in the gaps. Often tracks have labelled these as "easy"!

The Exercism About page states:

"Exercism exists to help as many people as possible attain fluency in any programming language they want. Programming skills can be source of great fun and a route to social mobility and we want to help people regardless of their backgrounds or their motivations. "

Exercises with problem descriptions that require domain expertise/knowledge do the exact opposite of that, creating frustration for those who lack a background in that area.

Examples:

I can see why these exercises may be appealing to those with that knowledge, but to those of us whose expertise lies elsewhere, they shouldn't be mixed in with the general pool of exercises where folks may stumble upon them, see they're "easy" and then start doubting their competence when they don't even understand the question (it will feed imposter syndrome and related issues). I'd instead suggest putting them in some other pool where folks who want to do this sort of thing can hunt them out. Or simplify them to the level that domain expertise is not required.

siebenschlaefer commented 2 years ago

Did you try to research the problem, e.g. by asking your favorite search engine for "generate pythagorean triples"?

mohmad-null commented 2 years ago

Did you try to research the problem, e.g. by asking your favorite search engine for "generate pythagorean triples"?

Thank you for the suggestion, but this circumvents the point: It shouldn't be necessary! This is a website for learning programming, not weird maths formulae that I'll never need to use again.

(And that's ignoring that many of the responses are written using notation and language that requires some proficiency with maths - looking at you wikipedia)

siebenschlaefer commented 2 years ago

Quoting from the exercise "difference-of-squares":

You are not expected to discover an efficient solution to this yourself from first principles; research is allowed, indeed, encouraged. Finding the best algorithm for the problem is a key skill in software engineering.

I think the same applies to a lot of other exercises, too, and it is also true in practice.

My personal opinion: If the domain-specific knowledge can be acquired easily I don't see it as a problem. For this exercise googling "generating pythagorean triples" will lead you to the dedicated page on Wikipedia. I think this exercise is fine as it is.

SaschaMann commented 2 years ago

Thank you for the suggestion, but this circumvents the point: It shouldn't be necessary! This is a website for learning programming, not weird maths formulae that I'll never need to use again.

Practice exercises sometimes require additional knowledge. Some require knowledge on how to write a parser, knowledge about strings, floating point arithmetics etc. All of those are topics that other people "will never need to use again". It heavily depends on your background and what you want to learn programming for. Someone who uses Exercism to improve their programming skills so they can write better code for their research might not need to do any of the string exercises, yet there are dozens of them.

You don't have to do any of them so if you don't want to solve more mathsy exercises, you can simply skip them. But giving away the solution will take away exercises from people who do enjoy like this.


That said, finding a solution that runs fast enough is possible with only 7th grade maths and no knowledge of advanced notation, see the 2nd example here for example: https://github.com/exercism/website-copy/blob/main/tracks/julia/exercises/pythagorean-triplet/mentoring.md (the code is in Julia but the explanation should be understandable without it)

mohmad-null commented 2 years ago

"generating pythagorean triples" will lead you to the dedicated page on Wikipedia.

I linked to it in my last message as an example of how opaque results can be (Wikipedia is usually the worst and this example doesn't fail to fail). Evidently you are able to parse that but I can assure you, it's a wall of unhelpful gibberish to me and I would be more than happy to bet, to most other people on the planet too.

https://github.com/exercism/website-copy/blob/main/tracks/julia/exercises/pythagorean-triplet/mentoring.md

The naive Cubic time solution is the one I was coming up with myself, except there's an implicit test to ensure that you don't actually use that solution (at least with Python; n=30,000, hence 27 trillion comparisons). So the tests have arbitrarily decided that the user must have a certain level of maths proficiency; the naive solution is unacceptable.

Practice exercises sometimes require additional knowledge.

Indeed, but where is that limit? I put it to you that they should all be self-contained as much as possible.

I suspect the reason that maths problems seem to fall through the cross-domain "acceptability" gap is that many folks who code also happen to be highly proficient at it. You need to remember that most people don't share that capability.

junedev commented 2 years ago

@mohmad-null Its a bit difficult to discuss in two issues at once. Since the discussion started here, could you integrate the other examples into this issue please and close the other one (you can also adjust the title here). Then we can continue the discussion here. Also please remember the Code of Conduct and make sure to be kind and polite. (Applies to everyone of course.)

Now to the topic. I think @mohmad-null raises some valid points.

b) also applies for other "domain knowledge". Is it really not necessary to understand the details to solve the exercise? And if more info is needed to solve the exercise, is that info really that easy to obtain?

SaschaMann commented 2 years ago

Indeed, but where is that limit? I put it to you that they should all be self-contained as much as possible.

I think we need to differentiate between explaining the solution and the terms mentioned in the problem description.

There are a few things in this exercise's description that aren't explained in it and not linked anywhere:

A Pythagorean triplet is a set of three natural numbers, {a, b, c}, for which, [...] Given an input integer N, find all Pythagorean triplets for which a + b + c = N.

Adding explanations or links to explanations to those concepts seems reasonable to me. However, actually giving away the solution in the description makes this exercise uninteresting to anyone who likes to figure out things on their own or with some research and turns this into an algorithm-implementation exercise for everyone. I might have misunderstood you but as far as I understand, you're suggesting the description should contain an explanation for a better-than-cubic-time solution, which imo falls under the latter.

There's always the option to submit the slow solution and request mentoring, too.


(Wikipedia is usually the worst and this example doesn't fail to fail)

I agree. I think links to Wikipedia should be avoided in for explanations of prerequisite concepts, there are usually sites that explain it without as much lingo.

mohmad-null commented 2 years ago

Searching for the information has a number of problems:

1) It's adding a further barrier to the problem, impeding the problem solving process. 2) The barrier is unnecessary. Unless you're trying to teach people how to solve problems in a specific domain, there's no reason for a problem to be in domain specific. 2) It requires a certain education & education level to be able to integrate whatever the technical information is into your mental models 3) Search results are susceptible to the filter bubble. What you find isn't what I'll find, even for the exact same term. 4) It ignores global inequalities in education access 5) It ignores global inequalities in computer/internet access (searching requires more time/bandwidth which can often be highly constrained). 6) It means the exercises can't be done offline.

For all of these reasons I'd again like to suggest all problems should be self-contained. There's absolutely no reason they shouldn't be.

"But it's n-grade maths" exemplifies it well, in particular: a) Anyone who hasn't reached that grade can't participate. b) It assumes all countries teach at the same levels c) There is a vast difference between maths proficiency levels, both within countries and globally. I point you to Figure 3 here (8th grade maths globally) - https://nces.ed.gov/programs/coe/indicator/cnt

However, actually giving away the solution in the description makes this exercise uninteresting to anyone who likes to figure out things on their own

If a person likes solving maths problems, more power to them. However, I'd like to suggest there are probably a lot of websites out there filled with tons of interesting maths problems for them to solve. Same for quite other domains, albeit to a lesser degree than maths. Exercism on the other hand is a generalist site designed to teach people programming.

I understand, you're suggesting the description should contain an explanation for a better-than-cubic-time solution .... There's always the option to submit the slow solution and request mentoring, too.

For Pythagorean Triplets, yes. The alternative is to remove the tests that make the maths-naive solution unworkable. It's impossible to submit the naive solution: it won't complete because n=30,000 in one of the Python Tests.

SaschaMann commented 2 years ago

If a person likes solving maths problems, more power to them. However, I'd like to suggest there are probably a lot of websites out there filled with tons of interesting maths problems for them to solve. Same for quite other domains, albeit to a lesser degree than maths. Exercism on the other hand is a generalist site designed to teach people programming.

Programming doesn't stand on its own. It's always applied to some field or subject. There's no universal background or field one can assume. I'm repeating myself but if you say that if this is an issue with maths exercises, the same must be applied to all exercises. You can take a programming course and strings will never even be mentioned, yet it's still programming and you learn how to program in it. We don't give away the solution to bob or other string exercises in the description. Nor do we explain what strings are.

Nobody is forcing you to solve all practice exercises in a track. You can skip them, only do the ones that interest you etc. Why should we exclude those people from Exercism and send them to other sites? The Python track has 129 exercises, the vast majority of which don't require any kind of maths or arithmetics at all. The barrier and exclusion you are outlining only applies if this was somehow a gatekeeper exercise but it isn't. For exercises that do block progression (Learning/Concept exercises), there are far stricter rules on prerequisite/background knowledge. You can become fluent in Python by doing the Learning exercises and various practice exercises without ever even looking at pythagorean-triplets.

As a side note, it's not a site designed to teach people programming. It's a site designed to teach people fluency in a programming language. It's pedantic but actually makes a difference because it allows us to assume certain knowledge. Having programming knowledge is required already, it's not a site for learning from scratch. For example, you will find that concept exercises that introduce loops in a track generally won't cover what a loop actually is, they will cover how a loop works in that particular language.

The problem that a naive brute-force approach doesn't solve a particular programming problem isn't restricted to mathsy problems either. I don't know if there is one in problem-specs but it's certainly possible to design exercises that aren't mathsy at all that will have this problem.

It's impossible to submit the naive solution: it won't complete because n=30,000 in one of the Python Tests.

You can submit solutions that don't pass the tests via the CLI to request mentoring. You can also look up community solutions without having solved the exercise by directly going to the link. I don't agree on these design choices to hide these options but that's been discussed in other issues.

siebenschlaefer commented 2 years ago

@moshegood How would this exercise look if it were self-contained? Are you thinking of something like this:

A Pythagorean triplet is a set of three natural numbers, {a, b, c}, for which,

a² + b² = c²

and such that,

a < b < c

For example,

3² + 4² = 9 + 16 = 25 = 5².

Given an input integer N, find all Pythagorean triplets for which a + b + c = N.

For example, with N = 1000, there is exactly one Pythagorean triplet for which a + b + c = 1000: {200, 375, 425}.

One possible algorithm that generates these Pythagorean triplets works like this

let a ← 3
while a < n ÷ 3
do
    let b ← (n² - 2n × a) / (2 × (n - a))
    if a < b and ⌊b⌋ = b
    then
        c ← n - a - b
        append(result, (a, b, c))
    endif
    a ← a + 1
done

And in general: Do you have a solution for this problem that you receive?

mohmad-null commented 2 years ago

Programming doesn't stand on its own. It's always applied to some field or subject.

Indeed yes, but then the question again becomes: Why are there so many maths-centric problems? Where are the problems for NLP, GIS, data analysis, visualisation (ok, that'd be hard to test), networking, Geology, Paleobotany, taxonomy, music theory (1), material science, philately,... ad infinitum?

If you head down this road then you open the door to all programming problems. Issue there is, it becomes harder for everyone who doesn't know about the 99% of fields that aren't theirs to find exercises that are suitable for them. How are you going to find suitable problems when there are 200, 400, 700, 1000? Intermingled across innumerable domains? I already struggle with Python.

Nobody is forcing you to solve all practice exercises in a track.

There are few problems with this reasoning, largely running around the fact it doesn't mesh with how many humans actually work.


As to @siebenschlaefer questions.

My answer remains:

Anything else and Exercism is failing at its stated goal of inclusion of people "regardless of their background or motivations".

If you want domain specific problems, cool, but put them somewhere slightly different on the site (another tab?) to make it clear there's a distinction. This will address the problems raised above. I know I'd be happy to solve problems in certain domains (not maths! ;-) ), and a section of the site filled with them would be great, but not intermingled with everything else.

SaschaMann commented 2 years ago

I now have some of these permanently "in progress" and no way to remove them; I expect other users do too.

I think you should raise this as a suggestion on exercism/exercism. Having some kind of hide button might be nice regardless of the outcome of this discussion.

BethanyG commented 2 years ago

For all of these reasons I'd again like to suggest all problems should be self-contained. There's absolutely no reason they shouldn't be.

@mohmad-null -- Could you provide some examples of practice exercises (not the concept exercises) that currently meet this criteria? Self-contained could be interpreted as only practicing a given programming language syntax, as opposed to gaining practice in language use by solving a problem.

While I don't think that's your intent, it would be really helpful to me to see what you consider exemplar exercises. As I think about the 117-odd practice exercises on the Python track, very few of them feel self-contained to me.

I'd also love to see any programming problems from other sites or resources you can point to. As we create more concept and practice exercises, it would be good to have examples.

iHiD commented 2 years ago

@mohmad-null Thanks for raising your concerns! We don't learn how others are experiencing our exercises unless people tell us, so it's essential for us to get feedback like this. So thanks for taking the time 🙂

And thanks to everyone else for digging into the questions raised.

I think there's a few different threads here, but probably two main ones:

  1. Maths stuff is painful for people without much maths knowledge (such as myself).
  2. Exercises that require other (non-maths) domain knowledge need to have that knowledge clearly explained.

For (1) I agree entirely and have wanted to "fix" this for a long time. The challenge is that maths is very common for some programmers and very uncommon for other programmers. In a language like Python, which is used for data science (maths is common) and all sorts of general purpose programming (maths generally uncommon), these exercises are brilliant for some people and a nightmare for others. So for people who are into maths, they think it's really easy and obvious. I, however (despite having a CS degree) now have no memory of what a "natural number" is and therefore find that exercise immediately alienating. We've previously discussed tagging maths-based exercises with a "maths" flag, which shows up as a sort of "If you like maths, you're gonna love this. If not, you might want to skip it" sort of heading (or something) and maybe the ability to disable those exercises in your settings. I think that potentially achieves the best of both worlds. I don't want to remove maths exercises as I think they are useful in many languages where those languages are commonly learnt because people are using it for math-y things, but I agree that they're really alienating to lots of people.

For (2) I don't think there are many other exercises that require domain-knowledge that isn't generally well explained. I've edited ones in the past where I found it to be too hard to parse (e.g. Hamming) and would love to find time to do this with a wider set. But I do also agree with the idea that it's nice to have exercises on topics that you might need to go and learn something about, as long as they sort of feel like more "projects" or "bonus" exercises, than something fundamental. Again, it feels like the key is making this clear to students and ensuring that the instructions link to places that explain things easily from first principles (e.g. not Wikipedia).

On a more general note, I agree with what @BethanyG nicely surmised, that it's good to practice problem solving in a programming language, as well as practicing syntax. ​It's probably important for tracks to consider how they're structured to ensure that we learn syntax then get to practice things, and maybe the ordering of practice exercises (or us adding further sub-categorisations) might help with that. I know re Python that Bethany puts a lot of thought into that structuring, so maybe it's more of a website-level thing to fix.


You ask why there are so many maths ones. Because maths exercises are easier to write (as they're normally existing problems like Pythag triplets that are commonly used) and because Exercism was seeded with lots of exercises like these. Newer exercises (e.g. the Concept Exercises) are very rarely mathsy. In fact, I'd suggest that nearly all "original" exercises on Exercism are not-mathsy, but ones that were taken from elsewhere early on are mathsy.


@mohmad-null Finally I think your points in the first half of this post are a fantastic reminder to us of which this subject is important. It's something everyone here cares about, but it's hard for us to sometimes put ourselves in the mindsets of others with extremely different experiences, to understand what is easy/hard/exciting/offputting for others. So thank you for raising those 🙂

mohmad-null commented 2 years ago

Could you provide some examples of practice exercises (not the concept exercises) that currently meet this criteria? Self-contained could be interpreted as only practicing a given programming language syntax, as opposed to gaining practice in language use by solving a problem.

Sure. I'd suggest to me that all of the below are fairly well self-contained. At least by my definition.

Basic Ones
Domain Specific ones

Some are already both self-contained and domain specific, although some of them don't have the best explanations/descriptions.

Borderline

So as you can see, they're definitely possible.

Of course, that's just my (subjective) list. What'd be really interesting is stats for how many people start / complete any given exercise. A quick search suggests there are none :-( .