Closed samumantha closed 3 years ago
There is no Carpentries lesson already existing on machine learning but I believe that is likely to change this year. For my part, when people have asked me in the past for a good introduction to machine learning (especially with Python), I have pointed them towards the Scikit Learn tutorials.
'Found' this: https://machinelearningcarpentry.github.io/machine-learning-novice/ but it seems to be 'on ice' since 2019, but with some talk about moving it to the incubator. (has it?)
Both from 2019. I guess we need a text of our own then, right?
We might want to develop a ML lesson again in the future (maybe starting with the one mentioned above), but I as long as it is there, I agree we should refer to some other existing resources. Note that from the lesson development plan, we expect learners to be familiar with:
Although the scikit-learn tutorials are useful, I'm not sure if they explain these basic concepts very well (but I haven't gone through all tutorials so maybe I missed it). Maybe it's useful to refer to specific sections of scikit-learn.
Some other, possibly relevant resources:
@dafnevk indeed there are tons of materials out there. Which is nice and hard at the same time, as I'd love to lend material from them (if the LICENSE permits) including citation. But these kind of citations are hard to track I think.
I think, we should focus on a survey style document, that allows people to query learners what they know. For example, we should create a catalogue of questions like:
You are given a dataset from experiments that you want to use for machine learning (13 columns with 25000 rows). One column is particularly useful and is encoded as real numbers in a range from
-15
to12
. You would like to normalize this data so that it fits into the range of real numbers between0
and1
. How would you do this?
- I've performed such an operation multiple times. If you give me an editor, I can code this up in a minute.
- I've done similar things in the past, I can copy & paste this over.
- I am not sure what to do. I'd consult a colleague or google.
- I don't know what to do.
As you can see, these kind of questions have no right/wrong answer and hence (hopefully) are designed to avoid learners feel intimidated. At the same time, we learn where our learners stand. Feedback welcome.
I'd sit down and write down a list of such questions and send a PR to _extras
. Would that be OK?
That could indeed be a good way to specify more in detail what the prerequired knowledge is, and to test whether learners indeed have that knowledge. Especially useful if you're actually organizing a workshop with this lesson and you want to make sure only people with the right background attend.
I think it would still be good to include some references (no need to include the actual material, I'd say) to places where learners can gain (some of) the prerequired knowledge, if they don't 'pass' the test.
Then we agree 100%. I think, a section on what to read/learn before attending this would be extremely valuable.
I sent a first PR on the learner survey. https://github.com/carpentries-incubator/deep-learning-intro/pull/59
Yes it was mentioned at carpentries-incubator/proposals#19 and machinelearningcarpentry/machine-learning-novice#9
I created the lesson mentioned here and am in the process of trying to restart development of it and move it to the incubator. I think it could make for a good pre-requisite to this lesson as it covers basic shallow neural networks. More on test/train splitting, data cleaning, over/underfitting and metrics are already on the to do list but not currently covered in any great detail.
I submitted possible pre-workshop assessment questions in PR #59. If anyone wants to review, feel free to do so.
Hei,
in order to find good points for the introduction (working on this with @annefou ) it would be good to know not only what are the prerequisites for this course, but also which for example Machine Learning (or any other) course is suggested to visit before visiting this course? Is there any information on this somewhere? Any Carpentries machine learning course?
And additionally: Would it make sense to have a short recap of important topics from 'the basics' (maths, ml,...?) somewhere, eg an optional episode?
Snowy greetings from Finland, Samantha