Closed benjamin-feder closed 6 years ago
@benjamin-feder
I had a look at the course, and the fourth chapter in particular. You are dealing with a huge data set in the last chapter. I checked, bike
is a data frame with 4018722 rows and 12 columns. That is over 48 million chunks of data.
Handling such large amount of data in the cloud is not straightforward for DataCamp's servers. Students gets about 800Mb of RAM to do their computations, and analyzing data like this is stretching it.
I did a commit to your review-ben
branch of the course. It does two things:
runtime_config: spark
in the course.yml
. This gives students more RAM when they are taking exercises in this course.It disabled the heaviest SCT functions in the exercise that you referenced. Doing the following:
ex() %>% check_function('summarize') %>% check_result() %>% check_equal()
sure is robust, but it's rerunning the summarize call in both the student environment and the solution environment. In this case, that means 2 extra extremely computationally heavy summarize(group_by(bike, ...))
calls. Disabling that last step makes sure the SCT can run within the timeout time again.
While it's fixed for now (you can submit the solution and it passes), the experience for students is not good. Code simply takes too long to execute. Just giving more resources to students is a very ad-hoc way of solving this, and shouldn't be the answer. Rather, I suggest you work with a subset of the data (random sample of 10%, for example).
I'm going to close the issue here, but I believe you should take this up either the instructor or @yashasroy, who seems to be responsible for this course.
Finally, some comments about your issue: it was a great first try, but I couldn't reproduce it as the pre-exercise-code refers to a data set (bike.RData
) that is baked into the course image through requirements.r
. I also didn't get a reference to the course on GitHub, on Teach or on campus. I managed to find it okay, but try to provide as much links as you can in the future. Thanks!
For some reason, when I include my full SCT, the question times out, but when I only include parts, it's good to go. I don't know why it's doing that, and it's not giving any feedback that I can work with.
Make a summary plot of the number of daily rides with workweek / weekend days colored differently.
@instructions
group_by()
andsummarise()
. Group bystart_day
and be sure to include the variableweekday
as well.@hint
@pre_exercise_code
@sample_code
@solution
@sct