arsbar24 / STAT545-hw-barton-alistair

Alistair's Assignments for STAT 545A
0 stars 0 forks source link

hw10 ready for grading #10

Open arsbar24 opened 6 years ago

arsbar24 commented 6 years ago

Link

Happy Holidays!

HScheiber commented 6 years ago

Repository/README:

dataframing.R Script

The ordering you want.

Grade_order <- c("A+","A","B+","B","B-","C+","C","C-","D+","D","D-", NAcharacter)

x[order(match(x, Grade_order))]

[1] "A+" "A+" "A" "B" "C+" "C-" NA


where the `match` function outputs the location of the first match of `x` in `Grade_order`. Still, your method works just fine!

- You demonstrate a very good understanding of the web scraping functions in this script.

- Small thing: I think I would have rescaled the `number` parameter in the `frame_prof` function so that it would start at `1` and go to maximum `7393`, then just add `7542` to the input within the function. Otherwise the function is great!

- You were able to generate a huge data frame of relevant data! Awesome!

- It's great that you used a `makefile` script to automate the construction of your data analysis.

### ScrapingBy.md

- You first plot is really cool! I'm impressed by the data you extracted. You may want to use the `labs()` function to change the name of your x axis label though, and maybe add a title.

- This is truly an awesome set of data you extracted here! Great job.

Overall: Awesome assignment, really creative. Clearly a lot of work put into this and it shows!

Happy holidays to you too!

Hayden
arthursunbao commented 6 years ago

Hi @arsbar24

Nice to see your repo and your work regarding the scrape data from ratemyprofessor.com, which is a website which I often visit. I will cover the work you did and some of my suggestions.

For the process of scrape the data, you have the function of grade2GPA, which is if-else to determine the letter grading comparing to the numerical grading, which is good.

For the function frame_prof, I would suggest for the extension of functionality, you can determine the number variable by not hard-coding the number directly into your code for future maintenance. It is also with the code base:

for(i in 7543:(7543 + DataSize)){ df2 <- frame_prof(i) df <- rbind(df, df2 ) }

Maybe you can not hard-code to make the code more accessible for future development.

For also the function of parsing the webpage, I would suggest that you can have a wrapper function to simplify the process of webpage %>% html_nodes("title") %>% html_text(trim = TRUE) as I have seen it in your code for several times.

Also for the fame_prof, I would also suggest to have more exception control over the internet transmission by determining the header file of the html so that you can have more understanding of the transmission error.

For the ScrapingBy.md file, I like the question you provided and the way you answer your question as well as the plot type you choose. I have no more suggestion for this part as I think you are better than me in this part.

Overall, congrats for finishing the final homework and enjoy the holiday !

Regards Jason

derekcho commented 6 years ago

Hi @arsbar24, here are some comments about your hw10:

Task(s) selected: Scrape data Data stored as file ready for downstream analysis: Yes Basic Exploration: Yes Reflection: Yes

Your grade will be emailed to you at a later date.