ProjectMOSAIC / mosaicData

R package with Project MOSAIC datasets
5 stars 6 forks source link

clean-up/agument RailTrail data set #20

Open rpruim opened 7 years ago

rpruim commented 7 years ago

To do:

Also fix bad URL in documentation.

nicholasjhorton commented 7 years ago

I've added the report to the github repo (see https://github.com/ProjectMOSAIC/mosaicData/blob/master/data-raw/PVPCcounts2005.pdf).

Note that I also merged beta and master to do this.

rpruim commented 7 years ago

I've added the new variable and recoded the old one:

glimpse(RailTrail)
## Observations: 90
## Variables: 11
## $ hightemp   <int> 83, 73, 74, 95, 44, 69, 66, 66, 80, 79, 78, 65, 41, 59, 50, 54, 97, 75, 63, ...
## $ lowtemp    <int> 50, 49, 52, 61, 52, 54, 39, 38, 55, 45, 55, 48, 49, 35, 35, 32, 71, 43, 35, ...
## $ avgtemp    <dbl> 66.5, 61.0, 63.0, 78.0, 48.0, 61.5, 52.5, 52.0, 67.5, 62.0, 66.5, 56.5, 45.0...
## $ spring     <int> 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0...
## $ summer     <int> 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1...
## $ fall       <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0...
## $ cloudcover <dbl> 7.6, 6.3, 7.5, 2.6, 10.0, 6.6, 2.4, 0.0, 3.8, 4.1, 8.5, 7.2, 10.0, 7.7, 5.8,...
## $ precip     <dbl> 0.00, 0.29, 0.32, 0.00, 0.14, 0.02, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.03...
## $ volume     <int> 501, 419, 397, 385, 200, 375, 417, 629, 533, 547, 432, 418, 193, 331, 280, 3...
## $ weekday    <lgl> TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, T...
## $ dayType    <chr> "weekday", "weekday", "weekday", "weekend", "weekday", "weekday", "weekday",...
rpruim commented 7 years ago

I don't think the data in the PDF Nick posted matches the data in RailTrail. Perhaps Nick can take a look and see.

RailTrail.R in data-raw creates a data set that matches the PDF, but I don't immediately see how the two data sets line up. There is a lot of similar data, but it doesn't match perfectly, and if these really are based on the same data, I don't see how the RailTrail data is ordered.

image

nicholasjhorton commented 7 years ago

Indeed: this is atrocious. I've gone through to try to reconcile the two forms and there are clearly a number of errors. I'm very embarrassed. This dataset stemmed from a student project in a class and I didn't check that they had correctly ingested it. Groan.

I'll need to take a closer look at this and get things reconciled.

rpruim commented 7 years ago

If you believe the PDF, I have already converted that into a data frame and we could simply do a replace. Some of the additional variables (spring, for example) could be computed from the dates.

nicholasjhorton commented 7 years ago

My suggestion would be to leave this issue open for now and work on an updated dataset that could be released in the late spring. I'll take the lead on this (as it will involve some cleanup of our existing examples). Again, my apologies for letting this error creep in.

On Jan 10, 2017, at 7:53 PM, Randall Pruim notifications@github.com wrote:

If you believe the PDF, I have already converted that into a data frame and we could simply do a replace. Some of the additional variables (spring, for example) could be computed from the dates.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or mute the thread.

Nicholas Horton Professor of Statistics Department of Mathematics and Statistics, Amherst College PO Box 5000, AC #2239 Amherst, MA 01002-5000

rpruim commented 6 years ago

@nicholasjhorton, any updates on this data situation?