alan-turing-institute / rds-course

Materials for Turing's Research Data Science course
https://alan-turing-institute.github.io/rds-course/
31 stars 13 forks source link

Module 1 #47

Closed gmingas closed 2 years ago

gmingas commented 3 years ago

This PR adds content for Module 1 (four lessons plus hands-on session)

review-notebook-app[bot] commented 3 years ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

triangle-man commented 2 years ago

Overall structure

My broad comment is that the lesson is long and discursive. I think all the content is here but the module could do with being beaten into a more explicit structure. At present it reads as a long-form paper. That may not work in the context of a synchronous training (unless you are asking them to read it beforehand, or individually?). Even if you do wish people to read it offline I think adding structure to the lessons would help significantly.

I would suggest:

  1. Try to write down, as clearly and concretely as you can, the 3 or 4 messages that you want to convey in each lesson. Perhaps one sentence per message; each of which is a statement.

    For example, Good: "Data is never complete and objective; the process of collecting it always involves a person making decisions." Bad: "Properties of datasets"

  2. Organise the lesson around those statements.

  3. Try to make each section heading in the lesson a statement (possibly one of the messages). (This is not always possible.)

Style

The lessons can make quite bold assertions that would make old-timers like me balk. Eg,

Having established that the key essential aspect of data science is the availability of (large amounts of) new data, the other fundamental component is constituted by a broad and multifaced [sic., perhaps 'multifaceted'?] ensemble of practices, methodologies and tools that, combined together, can lead to obtain "new insights" from a given dataset.

Is the availability of large amounts of new data the essential aspect of data science? Certainly it is true that most machine learning breakthroughs have relied on large training datasets, but I'm not sure they are necessary for data science, broadly construed.

Actually, perhaps it's worth moving the hierarchy of needs graphic to the front of the lesson, and structuring around it? Not sure.

One stylistic comment, we use the construction "If we consider X, ..." a lot. But we just mean "Consider X", or "Take X, for example." or "For example, X is ...". The construction with "If" to my mind demands a "then" which never turns up!