DS4PS / cpp-525-fall-2020

http://ds4ps.org/cpp-525-fall-2020/
1 stars 0 forks source link

Lab 01 - DataTS2 #3

Closed TVK36692 closed 3 years ago

TVK36692 commented 3 years ago

In following the lecture we get to a point where dataTS2 is introduced. I do not see where that is defined and what needs to be changed. Can someone point me in the right direction?

reg2 = lm(Y ~ T + D + P, data = dataTS2)

stargazer( reg2, 
           type = "html", 
           dep.var.labels = ("Wellbeing"),
           column.labels = ("Model results"),
           covariate.labels = c("Time", "Treatment", "Time Since Treatment"),
           omit.stat = "all", 
           digits = 2 )
JasonSills commented 3 years ago

Hi @TVK36692,

This is name of the dataset you are creating with your new variables.

TVK36692 commented 3 years ago

I may need to wait for a solution, then. I don't see the step where there is a new dataset being created.

Dselby86 commented 3 years ago

Under the Data section of the lab, there is a part that introduces a vector's of numbers

passengers <- c(1328, 1407, .... )

The first step of the lab is to make three variables: Time, Treatment and Time Since. Time is a count of the days, Treament is a binary variable that is 0 for days 1:120, and a 1 for days 121:365. And Time Since is 0 before day 120, and counts up starting at Day 121.

The lab instructions are calling dataTS2 this newly created dataset, and variables T, D, and P the new variables. You should change T, D, and P to the names of the variables you make for your new dataset.

Sean-In-The-Library commented 3 years ago

I feel like I'm lost too. I tried to call rep() on passengers to create the time dummy variable like the Oklahoma example from the lecture but it's throwing an error:

oklahoma$time <- rep( 1 : nrow( oklahoma ))

(but with passengers)

TVK36692 commented 3 years ago

Oh I was able to get that part with the lab. Just not the jump in the lecture where it assumed dataTS2 existed. Sean, can you share your error?

Sean-In-The-Library commented 3 years ago

Sure, and thanks!

passengers$time <- rep( 1 : nrow( passengers)) Error in 1:nrow(passengers) : argument of length 0

I feel like I'm taking crazy pills. This should be the easiest part.... haha

Dselby86 commented 3 years ago
rep(1, nrow(oklahoma)

Will create create a vector of length equal to the rows of Oklahoma.

you put

rep(1: nrow(oklahoma)

You replaced the comma with a Colon. Which try to repeat 1, 2, 3.... up to the length of Oklahoma. But then you didn't say how many times it needed to repeat itself.

If you don't think you are using a function right try adding a question mark before the function name. Like so:

?rep

This will show you a quick explanation of the function

Sean-In-The-Library commented 3 years ago

You're awesome. Thank you for the (amazingly fast) help.

JasonSills commented 3 years ago

Hi all,

Thanks for posting the rep() function.

I missed the lecture and don't see it posted, so I had to get scrappy. I used an ifelse statement: Time <- c(1:365) Treatment <- ifelse(Time >=121,1,0) TimeSince <- ifelse(Time <121,0,(1:365)-120)

Of course the flaw here is that the code isn't scalable, so it wouldn't work for say, automated reporting. For an one time ad hoc analysis, however, this builds the data as it is in the lab. Notice TimeSince looks odd with the -120. The negative integer is there because R starts counting at 0 no matter what. Making it -120 forces it to start returning a 1 right where you want it. Again, scrappy, but seems to do the trick.

Dselby86 commented 3 years ago

Jason,

Good response. That is one way to tackle the problem. If you wanted to make your solution scalable replace 121 and 365 with variables that you assign earlier in the code.

Intervention <- 121
StudyPeriod <- 365
Time <- c(1:365)
Treatment <- ifelse( Time >=Intervention, 1, 0 )
TimeSince <- ifelse( Time < Intervention, 0 , (1:StudyPeriod ) - Intervention +1 )

Once you have Time, Treatment, and TimeSince. It should be a small matter to cbind these vectors with the initial data vector, which will give you your dataset.

Niagara1000 commented 3 years ago

Hi all,

Thanks for posting the rep() function.

I missed the lecture and don't see it posted, so I had to get scrappy. I used an ifelse statement: Time <- c(1:365) Treatment <- ifelse(Time >=121,1,0) TimeSince <- ifelse(Time <121,0,(1:365)-120)

Of course the flaw here is that the code isn't scalable, so it wouldn't work for say, automated reporting. For an one time ad hoc analysis, however, this builds the data as it is in the lab. Notice TimeSince looks odd with the -120. The negative integer is there because R starts counting at 0 no matter what. Making it -120 forces it to start returning a 1 right where you want it. Again, scrappy, but seems to do the trick.

there was no lecture :(

Dselby86 commented 3 years ago

The lecture can be found here: https://ds4ps.org/pe4ps-textbook/docs/p-020-time-series.html

The lecture is asynchronous.

@Niagara1000 I apologize for the confusion I didn't plan to do a review session the first week, and should have communicated that clearly.

JasonSills commented 3 years ago

Hi @Dselby86

Thanks for posting the lecture. Question on cbind. In the data construction step I simply used: riderdat <- data.frame(passengers, Time, Treatment, TimeSince) No cbind, but everything seemed to work just fine. What would be the reason to use cbind, or are there multiple steps?

Thank you! I had a lot of fun with this lab.

Niagara1000 commented 3 years ago

The lecture can be found here: https://ds4ps.org/pe4ps-textbook/docs/p-020-time-series.html

The lecture is asynchronous.

@Niagara1000 I apologize for the confusion I didn't plan to do a review session the first week, and should have communicated that clearly.

Hi Professor @Dselby86 ,

No worries! Will we have a review session this week?

Thank you!

Dselby86 commented 3 years ago

@JasonSills

data.frame() looks like it works better because it saves the trouble of converting the data to a dataframe. There are a lot of ways to program something in R. For some reason I always used cbind(), but I learned a new way of doing it.

@Niagara1000

Yes, but I am traveling on Thursday so I'm going to host two over the weekend. I'll put an official announcement out tomorrow once everyone's work is in, and I've had a chance to grade it.

Niagara1000 commented 3 years ago

Hi Prof @Dselby86 ,

I used the code

passengers_TTT <- cbind.data.frame(passengers, Time, Treatment, TimeSince)
class(passengers_TTT)

and the class is data.frame. This method works, so is it alright to use it?

Dselby86 commented 3 years ago

I also am just now learning about the cbind.data.frame function. If the code produces a data frame where each column is a variable and each row is a unique observation, then you should be in good shape.

There are tons of ways to do something in R, some are more user friendly than others, some are faster than others. But with the amount of data we are working with in this class, it is very unlikely you will need to worry about optimizing your code. Just use whichever approach makes the most sense to you.

Niagara1000 commented 3 years ago

Hi Prof @Dselby86 ,

Another question: I created the dataset that I mentioned above, but realized it needs to named dataTS2 so i renamed it.

I am confused about the values of passengers and Y from the Data Section of the Lab. are they both the same?

Dselby86 commented 3 years ago

Y is the dependent variable, that you are attempting to predict with a linear regression. The idea is that the dependent variable is a function of the independent variables, and in the case of linear regression, has a linear relationship with those variables.

So to perform the linear regression, lm(), you need to substitute the name of the dependent variable for Y. Passengers is the dependent variable, because the regression is saying that the number of passengers is dependent on how much time has passed, the intervention, and how long the intervention has been implemented.