datacarpentry / openrefine-socialsci

OpenRefine for Social Science Data
https://datacarpentry.org/openrefine-socialsci/
Other
23 stars 47 forks source link

Explain when (not) to use OpenRefine #103

Open bencomp opened 2 years ago

bencomp commented 2 years ago

I have taught the OpenRefine lesson a few times; most recently today. Even though I always try to explain when you could choose OpenRefine for a problem, and how to compare OpenRefine to spreadsheets and writing a script, students keep asking for more explanation and comparisons. In our workshop the OpenRefine lesson is between Data organisation in spreadsheets and Introduction to R and that is also how I tried to frame OpenRefine: it shows your data like a spreadsheet application, but it has powers like a programming environment.

Seeing how I keep struggling to explain it well, even with years of experience with OR, we should probably improve the lesson materials.

It was suggested by helpers that referring back to my situating OR between spreadsheets and programming in the introduction later in the lesson might help, but the introduction episode should provide more context first.

bencomp commented 2 years ago

I do realise now that this has been mentioned in part in #79 and also relates to #56 and #38.

bencomp commented 2 years ago

I think we should look at the Library Carpentry lesson on OpenRefine for clearer use cases in the introduction episode: splitting data elements into different columns, normalising date formats and maybe matching/enhancing. This would go instead of the Motivations section, which is currently written for potential instructors (I feel).

Let's replace the Features and Getting help sections with How is OR different from spreadsheet applications? and When would you write a script instead of using OR?.

Spreadsheets

Scripts

bencomp commented 1 year ago

From #37:

bencomp commented 1 year ago

Perhaps it's also useful to distinguish OR from using SQL with a relational database. SQL also allows selection of rows and creating derivative columns. The cross function allows to join data from different projects, like JOIN in SQL. (cross is not currently part of the lesson, but I have used it myself.)

bencomp commented 3 weeks ago

Remember to remove the mention of this issue in the Instructor note in the Introduction section when this issue is being resolved. See #183.