Open mjbroerman opened 2 years ago
Thanks so much for reporting this, @mjbroerman -- and yikes 😬 ! Not a very "Quick Start" after all..
Could you please tell me what you get for packageVersion("pointblank")
? I know there have been some fabulous updates over there lately (some of which I even requested), and convo
may be presuming an incorrect/outdate internal structure there
Yes that would've been helpful to include in the first place. A bit new to this! 😅
packageVersion("pointblank") [1] ‘0.8.0’
Indeed, new to this space also and it's all quite exciting! So glad your package is a part of it!
Thanks so much, @mjbroerman !
This seems to be a case of convo
relying on a pointblank
pattern that isn't supported anymore. The core issue in the errors you point out is that the controlled vocabulary specifies checks for variables that do not match anything in the data.
So, while the example given doesn't work, if the controlled vocabulary contains only entries that have a match in the data, the function operates correctly. We can see this if I subset the first level of the controlled vocabulary to only include IND
and AMT
which both have matches in the data.
library(convo)
filepath <- system.file("", "ex-convo.yml", package = "convo")
convo <- read_convo(filepath)
convo[[1]] <- convo[[1]][c("IND", "AMT")]
write_pb(convo, c("IND_A", "AMT_B"), filename = "convo-validation.yml", path = "../Desktop")
In the short term, if your validation checks only include stubs present in the dataset you are validating, you will not encounter this issue in practice. However, I am very glad to know about it because its clearly a problem that needs to be fixed!
In the next comment, I'm going to leave myself some notes on initial research into why this is happening under the hood. Feel free to ignore the next comment because my goal is that users shouldn't have to think about that level. 🙂
Notes to self:
In pointblank
0.6.0 this works:
create_agent(read_fn = ~data.frame(IND_A = 1, AMT_B = 2)) %>%
col_vals_not_null(starts_with("ID"), step_id = 1) %>%
col_vals_not_null(starts_with("IND"), step_id = 2) %>%
interrogate()
Now in pointblank
0.8.0, we get an error when we attempt to check something that has no match in the data (the ID
check in the example above).
Tentatively, it looks like the right way to fix this will be something like this:
create_agent(read_fn = ~data.frame(IND_A = 1, AMT_B = 2)) %>%
col_vals_not_null(starts_with("ID"), step_id = 1,
active = ~. %>% {length(starts_with("ID", vars = colnames(.))) > 0} ) %>%
col_vals_not_null(starts_with("IND"), step_id = 2) %>%
interrogate()
However, I need to do more research to understand:
active
param seem to cause errors with write_yaml
?active
evaluated? Will this "round trip" correctly to and from YAML for portable use on a different dataset?Reprex for self:
library(pointblank)
# no `active` doesn't interrogate ----
agent1 <-
create_agent(read_fn = ~data.frame(IND_A = 1, AMT_B = 2)) %>%
col_vals_not_null(starts_with("ID"), step_id = 1) %>%
col_vals_not_null(starts_with("IND"), step_id = 2)
res1 <- interrogate(agent1)
# `active` (FALSE function) does interrogate; doesn't write yaml ----
agent2 <-
create_agent(read_fn = ~data.frame(IND_A = 1, AMT_B = 2)) %>%
col_vals_not_null(
starts_with("ID"), step_id = 1,
active = ~. %>% {length(starts_with("ID", vars = colnames(.))) > 0} ) %>%
col_vals_not_null(starts_with("IND"), step_id = 2)
res2 <- interrogate(agent2)
yaml_write(agent2, filename = "pb-test-2.yml", path = tempdir())
# `active` (TRUE function) does interrogate; writes empty yaml ----
agent3 <-
create_agent(read_fn = ~data.frame(IND_A = 1, AMT_B = 2)) %>%
col_vals_not_null(
starts_with("AMT"), step_id = 1,
active = ~. %>% {length(starts_with("AMT", vars = colnames(.))) > 0} ) %>%
col_vals_not_null(starts_with("IND"), step_id = 2)
res3 <- interrogate(agent3)
yaml_write(agent3, filename = "pb-test-3.yml", path = tempdir())
readLines(file.path(tempdir(), "pb-test-3.yml"))
# `active` (value not function) with cols present does interrogate; writes yaml ----
agent4 <-
create_agent(read_fn = ~data.frame(IND_A = 1, AMT_B = 2)) %>%
col_vals_not_null(
starts_with("AMT"), step_id = 1, active = FALSE) %>%
col_vals_not_null(starts_with("IND"), step_id = 2)
res4 <- interrogate(agent4)
yaml_write(agent4, filename = "pb-test-4.yml", path = tempdir())
readLines(file.path(tempdir(), "pb-test-4.yml"))
Hey @mjbroerman ! This should be fixed now with the latest dev version of pointblank
There's a secondary issue about filepath escaping depending on your OS. If you wish to keep using the version of convo
you installed, try this:
library(convo)
filepath <- system.file("", "ex-convo.yml", package = "convo")
convo <- read_convo(filepath)
tmp <- gsub("\\\\", "/", tempdir())
write_pb(convo, c("IND_A", "AMT_B"), filename = "convo-validation.yml", path = tmp)
Otherwise, try reinstalling convo
and then the example precisely as it's in the guide should work again.
Hi Emily,
Lovely package. I wanted to use it for an upcoming project, and while I was kicking the tires, I hit this
Likewise when interrogating:
I'd love to find out this was my misunderstanding, thanks!