Open arsbar24 opened 6 years ago
Your main README looks great, but there's no link to homework 8, 9, or 10 there. You may want to add those links to your list!
Your readme for HW10 is excellent! It gives a great summary of your work for this assignment, and a thorough progress report.
Cool choice of website to scrape!
# Some example grades
x <- c("B","A","A+","C-","C+","A+", NA)
Grade_order <- c("A+","A","B+","B","B-","C+","C","C-","D+","D","D-", NAcharacter)
x[order(match(x, Grade_order))]
[1] "A+" "A+" "A" "B" "C+" "C-" NA
where the `match` function outputs the location of the first match of `x` in `Grade_order`. Still, your method works just fine!
- You demonstrate a very good understanding of the web scraping functions in this script.
- Small thing: I think I would have rescaled the `number` parameter in the `frame_prof` function so that it would start at `1` and go to maximum `7393`, then just add `7542` to the input within the function. Otherwise the function is great!
- You were able to generate a huge data frame of relevant data! Awesome!
- It's great that you used a `makefile` script to automate the construction of your data analysis.
### ScrapingBy.md
- You first plot is really cool! I'm impressed by the data you extracted. You may want to use the `labs()` function to change the name of your x axis label though, and maybe add a title.
- This is truly an awesome set of data you extracted here! Great job.
Overall: Awesome assignment, really creative. Clearly a lot of work put into this and it shows!
Happy holidays to you too!
Hayden
Hi @arsbar24
Nice to see your repo and your work regarding the scrape data from ratemyprofessor.com, which is a website which I often visit. I will cover the work you did and some of my suggestions.
For the process of scrape the data, you have the function of grade2GPA, which is if-else to determine the letter grading comparing to the numerical grading, which is good.
For the function frame_prof, I would suggest for the extension of functionality, you can determine the number variable by not hard-coding the number directly into your code for future maintenance. It is also with the code base:
for(i in 7543:(7543 + DataSize)){ df2 <- frame_prof(i) df <- rbind(df, df2 ) }
Maybe you can not hard-code to make the code more accessible for future development.
For also the function of parsing the webpage, I would suggest that you can have a wrapper function to simplify the process of webpage %>% html_nodes("title") %>% html_text(trim = TRUE) as I have seen it in your code for several times.
Also for the fame_prof, I would also suggest to have more exception control over the internet transmission by determining the header file of the html so that you can have more understanding of the transmission error.
For the ScrapingBy.md file, I like the question you provided and the way you answer your question as well as the plot type you choose. I have no more suggestion for this part as I think you are better than me in this part.
Overall, congrats for finishing the final homework and enjoy the holiday !
Regards Jason
Hi @arsbar24, here are some comments about your hw10:
Task(s) selected: Scrape data Data stored as file ready for downstream analysis: Yes Basic Exploration: Yes Reflection: Yes
Your grade will be emailed to you at a later date.
Link
Happy Holidays!