Closed s3alfisc closed 1 year ago
The NumberofLocations
variable is externally-sourced information about the number of locations in the chain, not all of which may appear in the actual data itself (and additionally the same location may appear multiple times due to multiple inspections), so a count of the number of rows per business name shouldn't be expected to give the number of locations in the chain. You get the same result comparing against the NumberofLocations
variable in the R version alone:
library(causaldata)
library(dplyr)
data(restaurant_inspections)
restaurant_inspections <- restaurant_inspections %>%
group_by(business_name) %>%
mutate(new_num_locations = n())
all.equal(restaurant_inspections$new_num_locations, restaurant_inspections$NumberofLocations, check.attributes = FALSE)
# [1] "Mean relative difference: 0.6426152"
So thankfully I think this one is okay! Thanks for checking though.
Ah I see. Makes excellent sense. Thanks for the feedback! =)
Hi Nick,
I have started to tests
pyfixest
on "real world datasets" and am replicating all code examples in "the effect".I have noticed that there is a small error in how the data is processed / the source data between the Python and R version for one of the examples in chapter 13 on regression.
Here is a reproducible example:
Python:
R:
So either the source data set or the way the
NumberofLocations
variable is computed differ.I have installed the most up-to-date version of all packages.
Best, Alex