emilyriederer / convo

R package based on "Column Names as Contracts" blog post (https://emilyriederer.netlify.app/post/column-name-contracts/)
https://emilyriederer.github.io/convo/
Other
31 stars 1 forks source link

Quickstart guide example fails #4

Open mjbroerman opened 2 years ago

mjbroerman commented 2 years ago

Hi Emily,

Lovely package. I wanted to use it for an upcoming project, and while I was kicking the tires, I hit this

> filepath <- system.file("", "ex-convo.yml", package = "convo")
> convo <- read_convo(filepath)
> write_pb(convo, c("IND_A", "AMT_B"), filename = "convo-validation.yml", path = tempdir())
 Error in if (step_list$column[[1]] == step_list$columns_expr) { : 
  missing value where TRUE/FALSE needed 
7.
get_column_text(step_list = step_list, expanded = expanded) 
6.
as_agent_yaml_list(agent = agent, expanded = expanded) 
5.
pointblank::yaml_write(., filename = "convo-validation.yml", 
    path = "/var/folders/bp/l5qt50g13sndlz0x48jqqv300000gn/T//RtmpqDnAMJ") 
4.
pointblank::create_agent(read_fn = ~setNames(as.data.frame(matrix(1, 
    ncol = 2)), c("IND_A", "AMT_B"))) %>% pointblank::col_vals_not_null(matches("^([A-Za-z]_){0}ID"), 
    step_id = 1) %>% pointblank::col_is_numeric(matches("^([A-Za-z]_){0}ID"), 
    step_id = 2) %>% pointblank::col_vals_between(matches("^([A-Za-z]_){0}ID"),  ... at <text>#1
3.
eval(parse(text = code)) 
2.
eval(parse(text = code)) 
1.
write_pb(convo, c("IND_A", "AMT_B"), filename = "convo-validation.yml", 
    path = tempdir()) 

Likewise when interrogating:

> pointblank::interrogate(agent)

── Interrogation Started - there are 13 steps ───────────────────────────────────────────────────────────────
Error: Only strings can be converted to symbols
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rlang_error>
Only strings can be converted to symbols
Backtrace:
 1. pointblank::interrogate(agent)
 2. pointblank:::check_table_with_assertion(...)
 3. pointblank:::interrogate_not_null(agent = agent, idx = idx, table = table)
 4. pointblank:::get_column_as_sym_at_idx(agent = agent, idx = idx)
 5. rlang::sym(...)

I'd love to find out this was my misunderstanding, thanks!

emilyriederer commented 2 years ago

Thanks so much for reporting this, @mjbroerman -- and yikes 😬 ! Not a very "Quick Start" after all..

Could you please tell me what you get for packageVersion("pointblank") ? I know there have been some fabulous updates over there lately (some of which I even requested), and convo may be presuming an incorrect/outdate internal structure there

mjbroerman commented 2 years ago

Yes that would've been helpful to include in the first place. A bit new to this! 😅

packageVersion("pointblank") [1] ‘0.8.0’

Indeed, new to this space also and it's all quite exciting! So glad your package is a part of it!

emilyriederer commented 2 years ago

Thanks so much, @mjbroerman !

This seems to be a case of convo relying on a pointblank pattern that isn't supported anymore. The core issue in the errors you point out is that the controlled vocabulary specifies checks for variables that do not match anything in the data.

So, while the example given doesn't work, if the controlled vocabulary contains only entries that have a match in the data, the function operates correctly. We can see this if I subset the first level of the controlled vocabulary to only include IND and AMT which both have matches in the data.

library(convo)
filepath <- system.file("", "ex-convo.yml", package = "convo")
convo <- read_convo(filepath)
convo[[1]] <- convo[[1]][c("IND", "AMT")]
write_pb(convo, c("IND_A", "AMT_B"), filename = "convo-validation.yml", path = "../Desktop")

In the short term, if your validation checks only include stubs present in the dataset you are validating, you will not encounter this issue in practice. However, I am very glad to know about it because its clearly a problem that needs to be fixed!

In the next comment, I'm going to leave myself some notes on initial research into why this is happening under the hood. Feel free to ignore the next comment because my goal is that users shouldn't have to think about that level. 🙂

emilyriederer commented 2 years ago

Notes to self:

In pointblank 0.6.0 this works:

create_agent(read_fn = ~data.frame(IND_A = 1, AMT_B = 2)) %>%
  col_vals_not_null(starts_with("ID"), step_id = 1) %>%
  col_vals_not_null(starts_with("IND"), step_id = 2) %>%
  interrogate()

Now in pointblank 0.8.0, we get an error when we attempt to check something that has no match in the data (the ID check in the example above).

Tentatively, it looks like the right way to fix this will be something like this:

create_agent(read_fn = ~data.frame(IND_A = 1, AMT_B = 2)) %>%
  col_vals_not_null(starts_with("ID"), step_id = 1, 
                    active = ~. %>% {length(starts_with("ID", vars = colnames(.))) > 0} ) %>%
  col_vals_not_null(starts_with("IND"), step_id = 2) %>%
  interrogate()

However, I need to do more research to understand:

Reprex for self:

library(pointblank)

# no `active` doesn't interrogate ----
agent1 <-
  create_agent(read_fn = ~data.frame(IND_A = 1, AMT_B = 2)) %>%
  col_vals_not_null(starts_with("ID"), step_id = 1) %>%
  col_vals_not_null(starts_with("IND"), step_id = 2)

res1 <- interrogate(agent1)

# `active` (FALSE function) does interrogate; doesn't write yaml ----
agent2 <-
  create_agent(read_fn = ~data.frame(IND_A = 1, AMT_B = 2)) %>%
  col_vals_not_null(
    starts_with("ID"), step_id = 1,
    active = ~. %>% {length(starts_with("ID", vars = colnames(.))) > 0} ) %>%
  col_vals_not_null(starts_with("IND"), step_id = 2)

res2 <- interrogate(agent2)

yaml_write(agent2, filename = "pb-test-2.yml", path = tempdir())

# `active` (TRUE function) does interrogate; writes empty yaml ----
agent3 <-
  create_agent(read_fn = ~data.frame(IND_A = 1, AMT_B = 2)) %>%
  col_vals_not_null(
    starts_with("AMT"), step_id = 1,
    active = ~. %>% {length(starts_with("AMT", vars = colnames(.))) > 0} ) %>%
  col_vals_not_null(starts_with("IND"), step_id = 2)

res3 <- interrogate(agent3)

yaml_write(agent3, filename = "pb-test-3.yml", path = tempdir())
readLines(file.path(tempdir(), "pb-test-3.yml"))

# `active` (value not function) with cols present does interrogate; writes yaml ----
agent4 <-
  create_agent(read_fn = ~data.frame(IND_A = 1, AMT_B = 2)) %>%
  col_vals_not_null(
    starts_with("AMT"), step_id = 1, active = FALSE) %>%
  col_vals_not_null(starts_with("IND"), step_id = 2)

res4 <- interrogate(agent4)

yaml_write(agent4, filename = "pb-test-4.yml", path = tempdir())
readLines(file.path(tempdir(), "pb-test-4.yml"))
emilyriederer commented 2 years ago

Related to https://github.com/rich-iannone/pointblank/issues/355

emilyriederer commented 2 years ago

Hey @mjbroerman ! This should be fixed now with the latest dev version of pointblank

There's a secondary issue about filepath escaping depending on your OS. If you wish to keep using the version of convo you installed, try this:

library(convo)
filepath <- system.file("", "ex-convo.yml", package = "convo")
convo <- read_convo(filepath)
tmp <- gsub("\\\\", "/", tempdir())
write_pb(convo, c("IND_A", "AMT_B"), filename = "convo-validation.yml", path = tmp)

Otherwise, try reinstalling convo and then the example precisely as it's in the guide should work again.