UofTCoders / grad-course

0 stars 1 forks source link

comments and feedbacks #9

Open SaraMati opened 5 years ago

SaraMati commented 5 years ago

Thoughts from Shreejoy Tripathy February 23, 2019:

Here's some quick comments about the course syllabus you've outlined. Overall, I really like the idea of the course and would be happy to be involved, including as an co-instructor. Do you have an idea for the course title? Perhaps something like: "An introduction to data science"? In general, I'm even more convinced that as many scientists as possible should take a course like this.

I really like the headings under Programming, including basic python, data wrangling and tidying. One thing I would add is how to properly use and format spreadsheets.

For quantitative methods, I would need to see these spelled out more in terms of topics and lectures. I don't have a good sense for how many lectures you would devote to statistics, but I think it's very important. Probably like at least 3-4. These would include topics like statistical philosophy and what is a random variable, distributions, etc. Also two group comparisons (t-tests and non-parametric tests). Regression, multivariate regression, possibly lasso, feature selection, comparing models using anova, AIC/BIC. I quite like this syllabus, and think it would be a good guideline for the statistics stuff: https://stat540-ubc.github.io/subpages/lectures.html

I think you should think very hard about who your prototypical student is. I personally think this course can be really great and effective with absolutely no time devoted to time series analysis. Obviously time series analysis is important, but I personally would prioritize the basics (so maybe 1 lecture only on time series). Similarly, [having] a class on basic data tidying, some plotting, and t-tests, and possibly ... with git/github ...

I absolutely LOVE the idea of the project. Essential for a course like this. In order to have the projects be effectively supervised/managed, it'll be important to get TAs (probably 1 TA for 3-4 projects) with regular update meetings.

Sean Hill, Feb 15 meeting

Popovic, Jan 31 meeting Other than full support, the main comment was to make sure we describe how the course is sustainable after the first round of instructors (us)

SaraMati commented 5 years ago

my comments to Shreejoy's email: regarding the spreadsheet lesson: ​I feel this is going backwards! unless he means having a lesson on how to properly store data in general, I had such content in mind under the tidy data title. also how to clean up data, etc.

regarding quantitative methods: He is not saying anything different than we had in mind already. In general, I think we should include stats because our goal is that the students should be able to provide a scientific report, and I think stressing that a proper scientific conclusion should be based on proper stats is important. To be able to cover them all in the live-participatory coding sessions, I think we can provide them with notes and resources about the concepts and focus on teaching the coding and applying to examples. We should have in mind that we are not a stats course, or a machine learning course, so we are not there to teach them the concepts, but how to not use them in a sloppy way. "how and where to apply which". Having in mind our target population helps: I'm doing this course for my first-year-grad me. we can think of students in CPIN (collaborative program in neuroscience): from different engineering fields, physiology, engineering science, psychology, etc. There are good courses in statistics in all those departments. and at the end, one course can't teach all the methods that they may need in the course projects. so the assumption is that they either have heard the concepts before, or can read on their own.

​well, collaborators such as post docs with physiology background can audit and won't need to have the project. I don't insist in having time series, but overall we may benefit from reusing material from the rcourse.

regarding regular meetings with TAs: ​yes, we mentioned this in the end of year meeting for the Rcourse this year.

joelostblom commented 5 years ago

I haven't seen the email, but maybe you can include what we taught during the Python workshops last summer? So a more focused version of the data carpentry spreadsheet section covering what are good general data practices as you said and what spreadsheets are good for (e.g. data entry) and what their limits are (e.g. data analysis).