Crunch-io / rcrunch

R package for interacting with the Crunch API
https://crunch.io/r/crunch/
GNU Lesser General Public License v3.0
9 stars 15 forks source link

Variables cache fixes 184826032 #619

Closed gergness closed 1 year ago

gergness commented 1 year ago

Allows users to opt out of some performance problems introduced by the lazy variables catalog.

For context @sluga, early on when I started at crunch, the main variables catalog contained whether each variable was hidden or not. When we added "private" (aka "secure" variables according to the API, though Mike hates this word), we switched to having to navigate the hidden/private variable folder hierarchy to figure out which variables were hidden or private.

This PR:

  1. Checks if user wants to get warnings about hidden/private variables before actually checking if the variable is hidden/private so that basic variable access does not require traversing the hidden/private folders. See below for a reprex.
  2. Adds a new option crunch.exclude.hidden.private that when set to FALSE (not the default), includes hidden/private variables in names(ds). This fixes weird behavior because RStudio's autocompletion uses names() when it detects the user ds$, so RStudio users have weird stuttering with datasets with non-trivial hidden/private variable folder structures.

I think that 1. is an unambiguous win, I hadn't realized how easy it would be to fix, I thought these 2 problems were more entangled and so didn't realize I could fix it without messing with names. I believe that using this update version of rcrunch, will allow the rankinator to set set_crunch_opts(crunch.warn.hidden = FALSE, crunch.warn.private=FALSE) and then many operations will go from taking ~10 seconds to ~2. I kind of wonder if it should be the new default.

I'm less sure about 2. It seems like a big change to me, but performance about hidden/private variables has been a real issue since rcrunch stopped using the variables catalog for it, and maybe the answer is that R users don't have easy differentiation on variable types.

Before this PR

Any action with a variable ends up loading the hidden & private variable catalogs even if you've turned off warnings about hidden and private variables.

suppressPackageStartupMessages(library(crunch))
setupCrunchAuth("team")
httpcache::clearCache()
ds <- loadDataset("Vegetables example - used")
#> Connecting to https://team.crunch.io/api/ with key set using `setupCrunchAuth('team')`.

httpcache::startLog()
set_crunch_opts(crunch.warn.hidden = FALSE, crunch.warn.private = FALSE)
ds$healthy_eater
#> 2023-03-29T15:03:09.850 HTTP GET https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/variables/?relative=on 200 3570 0 0 0 0 0.19 0.191
#> 2023-03-29T15:03:09.850 CACHE SET https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/variables/?QUERY=4fd71abbd587e0265a5bd7ad5324208b
#> 2023-03-29T15:03:10.035 HTTP GET https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/variables/hier/?relative=on 200 1234 0 0 0 0 0.17 0.17
#> 2023-03-29T15:03:10.035 CACHE SET https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/variables/hier/?QUERY=4fd71abbd587e0265a5bd7ad5324208b
#> 2023-03-29T15:03:10.102 CACHE SET https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/variables/?VariableCatalog&QUERY=4fd71abbd587e0265a5bd7ad5324208b
#> 2023-03-29T15:03:10.102 CACHE HIT https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/variables/?VariableCatalog&QUERY=4fd71abbd587e0265a5bd7ad5324208b
#> 2023-03-29T15:03:10.282 HTTP GET https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/folders/ 200 349 0 0 0 0 0.165 0.165
#> 2023-03-29T15:03:10.282 CACHE SET https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/folders/
#> 2023-03-29T15:03:10.467 HTTP GET https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/folders/hidden/ 200 965 0 0 0 0 0.173 0.174
#> 2023-03-29T15:03:10.468 CACHE SET https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/folders/hidden/
#> 2023-03-29T15:03:10.650 HTTP GET https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/folders/61d7c1624044447abcc0f678140a9225/ 200 221 0 0 0 0 0.155 0.155
#> 2023-03-29T15:03:10.650 CACHE SET https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/folders/61d7c1624044447abcc0f678140a9225/
#> 2023-03-29T15:03:10.826 HTTP GET https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/folders/ef667cdec445416fb55c4f4563342b19/ 200 364 0 0 0 0 0.165 0.165
#> 2023-03-29T15:03:10.826 CACHE SET https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/folders/ef667cdec445416fb55c4f4563342b19/
#> 2023-03-29T15:03:10.833 CACHE SET https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/variables/?HiddenVariableCatalog
#> 2023-03-29T15:03:10.835 CACHE HIT https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/folders/
#> 2023-03-29T15:03:11.001 HTTP GET https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/folders/secure/ 200 314 0 0 0 0 0.158 0.158
#> 2023-03-29T15:03:11.001 CACHE SET https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/folders/secure/
#> 2023-03-29T15:03:11.006 CACHE SET https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/variables/?PrivateVariableCatalog
#> Healthy Eater (categorical)
#> Healthy Eater
#> 
#> 2023-03-29T15:03:11.201 HTTP GET https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/cube/?query=%7B%22dimensions%22%3A%5B%7B%22variable%22%3A%22https%3A%2F%2Fteam.crunch.io%2Fapi%2Fdatasets%2F8e6c08d6a6d4409fb3b13019d72ca258%2Fvariables%2F6EkcLwFh05QFY6O4lnGSok000004%2F%22%7D%5D%2C%22measures%22%3A%7B%22count%22%3A%7B%22function%22%3A%22cube_count%22%2C%22args%22%3A%5B%5D%7D%7D%7D&filter=%7B%7D 200 815 0 0 0 0 0.18 0.18
#> 2023-03-29T15:03:11.201 CACHE SET https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/cube/?QUERY=e6711b4ae7ce0e20ba9cb4c6765eab1e
#>     Count
#> Yes   120
#> No     85

After this PR

Turning off the warnings about hidden and secure variables avoids navigating the hidden & private folder hierarchies.

suppressPackageStartupMessages(library(crunch))
setupCrunchAuth("team")
httpcache::clearCache()
ds <- loadDataset("Vegetables example - used")
#> Connecting to https://team.crunch.io/api/ with key set using `setupCrunchAuth('team')`.

httpcache::startLog()
set_crunch_opts(crunch.warn.hidden = FALSE, crunch.warn.private = FALSE)
ds$healthy_eater
#> 2023-03-29T15:05:36.290 HTTP GET https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/variables/?relative=on 200 3570 0 0 0 0 0.181 0.181
#> 2023-03-29T15:05:36.291 CACHE SET https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/variables/?QUERY=4fd71abbd587e0265a5bd7ad5324208b
#> 2023-03-29T15:05:36.482 HTTP GET https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/variables/hier/?relative=on 200 1234 0 0 0 0 0.178 0.178
#> 2023-03-29T15:05:36.482 CACHE SET https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/variables/hier/?QUERY=4fd71abbd587e0265a5bd7ad5324208b
#> 2023-03-29T15:05:36.543 CACHE SET https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/variables/?VariableCatalog&QUERY=4fd71abbd587e0265a5bd7ad5324208b
#> 2023-03-29T15:05:36.544 CACHE HIT https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/variables/?VariableCatalog&QUERY=4fd71abbd587e0265a5bd7ad5324208b
#> Healthy Eater (categorical)
#> Healthy Eater
#> 
#> 2023-03-29T15:05:36.734 HTTP GET https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/cube/?query=%7B%22dimensions%22%3A%5B%7B%22variable%22%3A%22https%3A%2F%2Fteam.crunch.io%2Fapi%2Fdatasets%2F8e6c08d6a6d4409fb3b13019d72ca258%2Fvariables%2F6EkcLwFh05QFY6O4lnGSok000004%2F%22%7D%5D%2C%22measures%22%3A%7B%22count%22%3A%7B%22function%22%3A%22cube_count%22%2C%22args%22%3A%5B%5D%7D%7D%7D&filter=%7B%7D 200 815 0 0 0 0 0.172 0.172
#> 2023-03-29T15:05:36.734 CACHE SET https://team.crunch.io/api/datasets/8e6c08d6a6d4409fb3b13019d72ca258/cube/?QUERY=e6711b4ae7ce0e20ba9cb4c6765eab1e
#>     Count
#> Yes   120
#> No     85
codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (07f3990) 90.94% compared to head (eda0a1e) 90.94%.

:exclamation: Current head eda0a1e differs from pull request most recent head 728a00a. Consider uploading reports for the commit 728a00a to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #619 +/- ## ======================================= Coverage 90.94% 90.94% ======================================= Files 128 128 Lines 8469 8472 +3 ======================================= + Hits 7702 7705 +3 Misses 767 767 ``` | [Impacted Files](https://codecov.io/gh/Crunch-io/rcrunch/pull/619?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Crunch-io) | Coverage Δ | | |---|---|---| | [R/misc.R](https://codecov.io/gh/Crunch-io/rcrunch/pull/619?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Crunch-io#diff-Ui9taXNjLlI=) | `99.18% <ø> (ø)` | | | [R/dataset-extract.R](https://codecov.io/gh/Crunch-io/rcrunch/pull/619?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Crunch-io#diff-Ui9kYXRhc2V0LWV4dHJhY3QuUg==) | `98.48% <100.00%> (ø)` | | | [R/dataset.R](https://codecov.io/gh/Crunch-io/rcrunch/pull/619?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Crunch-io#diff-Ui9kYXRhc2V0LlI=) | `93.78% <100.00%> (+0.10%)` | :arrow_up: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Crunch-io). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Crunch-io)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

sluga commented 1 year ago

Regarding 2:

a) Perhaps the option name could be even clearer, e.g. crunch.names.excludes.hidden.private.variables, or a positive version crunch.names.includes.hidden.private.variables? I think I prefer this last version, but I'm probably above-average in my tolerance for long names. :)

b) Yeah, I think I'd want to keep backward compatibility on this. What if we added a startup message, so that after users call library(crunch) they see something like: "Experiencing slow autocompletion of variable names? See: {{link}}" ... and the link would tell them about this option & its implications, without the need to be super terse.

gergness commented 1 year ago

I'm less sure of how this newest option that I pushed today (crunch.order.var.catalog) will work, but want to push this to master so I can try running with it.