Open rdisalv2 opened 1 year ago
Hi, thanks. Well it's been a while, but I guess the idea is to find products where date (day, month and year) is the same and look those instances only.
Thanks. Oh I see, date and (day, month, year) are the two different dates. I thought they were the same date. (They are different from each other in 1.3% of cases) The codebook from the dataverse materials from the paper says that they're the same though:
date float %td Date for offline data collection, in stata format
day byte %9.0g Day for offline data collection
month byte %9.0g Month for offline data collection
year int %9.0g Year for offline data collection
BUT, the codebook from the dataverse also has this
imputed byte %9.0g =0 if the online price was collected on the exact same day (otherwise it was collected within 7 days)
which seems promising but it's tab is weird
. tab imputed, m
imputed | Freq. Percent Cum.
------------+-----------------------------------
1 | 22,414 49.53 49.53
. | 22,839 50.47 100.00
------------+-----------------------------------
Total | 45,253 100.00
I just checked the xlsx file from the dataverse replication. the . has to be 0, because that column is just a 1 or a blank in the xlsx. So I think keep if missing(imputed) would be the way to keep only same-days
The data exercise on page 166 using the billion prices project, question 3, asks to restrict the data to prices that are assessed on the same day. But the dataset used in the case study doesn't seem to have a variable for that, or a variable that permits construction of that:
it's a good question otherwise, I'd love to use it