Closed zackarno closed 2 months ago
Yeah, it's a discussion that we were having with @hannahker as well. I worry if we just moved check to top, we aren't running the saving out functionality, which may impact some testing down the line. However, if we are moving testing into #131 to be separate, and GMAS_TEST_RUN
is being renamed HS_DRY_RUN
, could make this change. I think still would want to discuss if we want to completely available loading the blob here and running switch()
, cause the dry run functionality can be a nice way to interactively develop and make sure nothing is breaking and viewing output emails. Let's see what Hannah says when back!
So, ran through update_az_file()
to investigate this:
container <- get_container(container)
fileext <- tools$file_ext(name)
tf <- tempfile(fileext = paste0(".", fileext))
switch(fileext,
csv = readr$write_csv(x = df, file = tf, na = ""),
parquet = arrow$write_parquet(x = df, sink = tf),
json = jsonlite$write_json(x = df, path = tf),
geojson = sf$st_write(obj = df, dsn = tf, quiet = TRUE)
)
The only potential need to connect to Azure would be in get_container()
, which pulls in the container_...()
functions. Looking through container_prod()
for example:
container_endpoint_prod <- az$blob_endpoint(
endpoint = azure_endpoint_url("blob", "prod"),
sas = get_env("DSCI_AZ_SAS_PROD")
)
az$blob_container(
endpoint = container_endpoint_prod,
name = "hdx-signals"
)
And both of these, while they look complex, are simply returning list objects with a special S3 class.
function (endpoint, key = NULL, token = NULL, sas = NULL, api_version = getOption("azure_storage_api_version"))
{
obj <- list(url = endpoint, key = key, token = token, sas = sas,
api_version = api_version)
class(obj) <- c("blob_endpoint", "storage_endpoint")
obj
}
<bytecode: 0x7fbef90bfd08>
<environment: namespace:AzureStor>
function (endpoint, name, ...)
{
obj <- list(name = name, endpoint = endpoint)
class(obj) <- c("blob_container", "storage_container")
obj
}
<bytecode: 0x7fbefd30ea68>
<environment: namespace:AzureStor>
So we actually don't have to connect at all to the blob in these steps, works entirely without internet access. Only validated at the point of access.
would it make sense to move the
gmas_test_run()
check to the top of the function. That way if it'sTRUE
no need to connect to the blob at all?