In SLU06, we talk about cleaning categorical and numerical variables but these concepts are introduced in detail only in SLU12.
I'm ccing @majkah0 @danizao since you were the instructors for this year to have your feedback.
Detailed Description
There is a key distinction we need to make between statistical data types (what the values represent) and the implementation data types (how they are particularly stored). As an example, a categorical (statistical) can be both stored as ints, strings, binary, etc. (implementation).
I'm also detailing what we currently have in each SLU regarding data types, for a better perspective
SLU01 - Pandas 101:
Series of int, float, object (implementation types, not statistical)
SLU06 - Dealing with Data Problems:
Data entry problems (categoricals + numerical)
Missing values (categoricals + numerical)
SLU12 - Feature Engineering:
Types of data in Pandas (numeric, datetime, string)
Advance data types (category, ordinal)
Types of statistical data and how to handle
Numerical: discrete, continuous
Categorical: binary, ordinal, one-hot
Possible Implementation
Since the types of statistical data are a fundamental concept, we can introduce them in SLU01.
To avoid scope-creep on SLU01 which is focused on pandas (the package), we can introduce the concepts in SLU06 instead
My concern with option 1 is that it lacks a practical context, but it exposes the concepts right away.
Option 2 can introduce the "statistical data types" concept and use them immediately to showcase how we can (and should) treat them differently when preprocessing data; there is no much loss in delaying from SLU01 to SLU06 because the ones in-between do no rely that much on these concepts.
Context
In SLU06, we talk about cleaning categorical and numerical variables but these concepts are introduced in detail only in SLU12.
I'm ccing @majkah0 @danizao since you were the instructors for this year to have your feedback.
Detailed Description
There is a key distinction we need to make between statistical data types (what the values represent) and the implementation data types (how they are particularly stored). As an example, a categorical (statistical) can be both stored as ints, strings, binary, etc. (implementation).
I'm also detailing what we currently have in each SLU regarding data types, for a better perspective
SLU01 - Pandas 101:
SLU06 - Dealing with Data Problems:
SLU12 - Feature Engineering:
Possible Implementation
My concern with option 1 is that it lacks a practical context, but it exposes the concepts right away.
Option 2 can introduce the "statistical data types" concept and use them immediately to showcase how we can (and should) treat them differently when preprocessing data; there is no much loss in delaying from SLU01 to SLU06 because the ones in-between do no rely that much on these concepts.