01-edu / public

📚 @01-edu's Public Repository
http://public.01-edu.org/
202 stars 429 forks source link

pandas audit #2460

Closed jo-eman closed 4 months ago

jo-eman commented 4 months ago

pandas

For question 11 in excercise 2 audit is expecting values that are not in the dataset:

from https://github.com/01-edu/public/tree/master/subjects/ai/pandas/audit

    2006-12-16    3.053475
    2006-12-17    2.354486
    2006-12-18    1.530435
    2006-12-19    1.157079
    2006-12-20    1.545658
                    ...
    2010-12-07    0.770538
    2010-12-08    0.367846
    2010-12-09    1.119508
    2010-12-10    1.097008
    2010-12-11    1.275571
    Name: Global_active_power, Length: 1433, dtype: float64

This dataset does not contain data after 2010-11: https://assets.01-edu.org/ai-branch/piscine-ai/household_power_consumption.txt

nprimo commented 4 months ago

Hi @jo-eman, thank you for the feedback. I have checked the dataset, and it contains data that can lead to the expected answer: the dataset provided might not be already ordered.

Have you tried to print the dataset ordered by date, for example?

nprimo commented 4 months ago

Feel free to comment if you're still facing any issues linked to this topic :)

jo-eman commented 4 months ago

Hi @nprimo thanks for getting back to me.

The dataset I am getting is 126 MB (132 960 755 byte) and I can still not find data for december 2010, which audit asks for, anywhere in the dataset. The first dates from 2006 in the audit example however are fine.

hash for the file: SHA256 4259C9D7ECE5DBEE9AB8D53682BAAC68D791C864F0F64A52B4043CB3B90894B7

I have confirmed with other students that the data for those dates do not exist in the dataset.

nprimo commented 4 months ago

Hi @jo-eman, thank you again for the input. After further investigation, I found out that there were other discrepancies in other questions related to exercise 2. I have opened a PR to fix these issues