01-edu / public

📚 @01-edu's Public Repository
http://public.01-edu.org/
202 stars 429 forks source link

AI piscine: TRAINING -- possible flaw in audit answers (Exercise 3) #2469

Closed SomaSapien closed 4 months ago

SomaSapien commented 4 months ago

Describe the bug

The audit answers indicate r2 scores which deviate significantly from the values being independently achieved by various students doing the task. The MSE / MAE values are the same, whilst all other metrics are the same, suggesting that the underlying dataset is not the issue, but perhaps the way in which the training & test sets are being created have changed since the audit answers were last revised. Otherwise perhaps some sort of platform / system / architecture dependency...?

Users

Students following the AI specialization, grit:lab, Åland

Severity

(❗️minor)

Type

(🗂️ documentation)

To Reproduce

Steps to reproduce the behavior:

Jupyter Lab script

Exercise 3: Regression

print("\nExercise 3\n")

Fetch the dataset

housing = fetch_california_housing() X, y = housing['data'], housing['target']

Split the dataset into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, shuffle=True, random_state=13)

Define and configure the pipeline

pipeline = [('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler()), ('lr', LinearRegression())] pipe = Pipeline(pipeline)

Fit the pipeline to the training data

pipe.fit(X_train, y_train)

Question 1: Predictions on Train and Test Sets

print("\nQuestion 1")

Predicting on the train set and test set

y_train_pred = pipe.predict(X_train) y_test_pred = pipe.predict(X_test)

Output the first 10 predicted values for both train and test sets

print("\n10 first values Train\n") print(y_train_pred[:10]) print("\n10 first values Test\n") print(y_test_pred[:10])

Question 2: Compute R2, MSE, and MAE

print("\n\nQuestion 2\n")

Compute R2, Mean Square Error, and Mean Absolute Error on the train set

r2_train = r2_score(y_train, y_train_pred) mse_train = mean_squared_error(y_train, y_train_pred) mae_train = mean_absolute_error(y_train, y_train_pred)

Compute R2, Mean Square Error, and Mean Absolute Error on the test set

r2_test = r2_score(y_test, y_test_pred) mse_test = mean_squared_error(y_test, y_test_pred) mae_test = mean_absolute_error(y_test, y_test_pred)

Print the results

print("r2 on the train set: ", r2_train) print("MAE on the train set: ", mae_train) print("MSE on the train set: ", mse_train) print() print("r2 on the test set: ", r2_test) print("MAE on the test set: ", mae_test) print("MSE on the test set: ", mse_test)

Workarounds

No workarounds, but if the audit template differs from the given results, it will be up to the auditor to judge pass / fail if the correct approach has been followed.

Expected behavior

The resulting r2_scores:

Attachments

N/A

Desktop (please complete the following information):

Smartphone (please complete the following information):

N/A

Additional context

N/A

nprimo commented 4 months ago

Thank you for the feedback @SomaSapien. I have just opened a PR to fix this issue