QiWang / Stat503

3 stars 0 forks source link

a summary of the report (Project 1) #1

Closed yihui closed 13 years ago

yihui commented 13 years ago

Here is almost the final version:

https://github.com/downloads/yihui/activities/Stat503-Project1-Wang-Zhao-Rowcliffe-Xie.pdf

Please proof-read it carefully. Make sure there are no typos; give your suggestions on how to make it more appealing, convincing and logically consistent. The most important thing is how to improve it according to the project requirements and grading rubric.

A couple of more things to clarify:

  1. information gain and Weka were not included on second thought; the root reason is still about the discretization of salary -- it will be much better if the regression tree was tried instead of the classification tree. There is just too much arbitrariness in discretization. On the other hand, I don't know how to present the results -- numbers? graphics?...
  2. Yifan made a good attempt on using new data sources, but I did not include the results either. The reason is I did not see anything particularly interesting; this was explained in the last section.

BTW, we told a lie in Figure 5 to some degree. The "truth" is:

qplot(minutes/gp, log(salary), data = bb,facets = .~position, 
  geom=c('point', 'smooth'))

such a dark trick should be avoided in general. I hope you understand what the "dark trick" is here.

QiWang commented 13 years ago

Thanks a lot for Yihui's wonderful job on this report. It gave us a lot of information about how to write the report in the future, how to use the code to make some pics.... Just a little thinkings:

  1. Cook asked us to follow the format used for case study 1(tipping), is that mean we should use like 1. description. 2.suggested analysis......
  2. should we talk about whether we did not delete any outliers?
  3. the new variable mpg seems very important in section 4, why mpg is not used in final model?
  4. should we do more things about the model diagnosis, like check the VIF...
  5. the R-square is 0.37 for the linear model, should we suggest that we could use some other model, like nonlinear ones, but of course not this time. maybe we could mention it in the conclusion.

It is really a surprising and amazing job Yihui did for the report in such a short time. My opinion is, it is totally ready to be submitted not only because that we have passed the due time also because it is a wonderful report.

yihui commented 13 years ago
  1. that will require substantial reconstruction; I'm already exhausted, so the only possibility is one of you three can do it;
  2. probably; this needs another diagnostic plot for outliers, although I guess the outliers won't have too much influence
  3. oops! this is a serious mistake; I've changed the R code; thanks!
  4. this makes a lot of sense; I should have done it, but again, I can only rely on you
  5. linearity has been justified in the report after log-transformation; the R square is a useless indicator in my eyes

Thanks!

I need to know the opinions of the other two members.

yihui commented 13 years ago

I've submitted in WebCT after a revision according to Qi's reminder on the omission of mpg.

BillRowcliffe commented 13 years ago

The understanding I had Friday was that you would spend the next 2-3 hours polishing the report and then submit it. I made no plans this weekend to edit. I have been in Iowa City visiting a friend who moved to connecticut today. Please feel free to call me (641) 832-8395 in the future if you have issues like this again. Thank you for submitting, I was in approval of the report as prepared before class Friday.

yihui commented 13 years ago

OK, I see. Thanks! We guessed you might be out of town.