RMI-PACTA / azure-architecture

Description and scripts used to deploy RMI-PACTA's Azure Architecture
MIT License
0 stars 0 forks source link

Consider Migrating from Azure FIles to Azure Blob Storage #2

Open AlexAxthelm opened 8 months ago

AlexAxthelm commented 8 months ago

I'm discovering an issue with Azure File Shares, in that when mounting via SMB to a Linux OS (which we use exclusively in our cloud VMs and Containers), authentication is handled with a Storage Account key, rather than a Shared Access Signature.

Storage Account Keys have the unfortunate problems of being:

  1. at the Storage Account level. Meaning access to the Storage Account, includes all file share contained in it
  2. inherently Read/Write

the second problem can be somewhat mitigated by being careful when mounting a file share by specifying permissions ( setting file_mode=0555,dir_mode=0555 in the sudo mount -t cifs command), but I don't want to rely on this as a long term solution.

cc @cjyetman @hodie @jdhoffa

AlexAxthelm commented 8 months ago

Next steps, run a spike and set up a demonstration architecture to explore how much of our current system actually relies on a local filesystem (or something approximating it via mounts), and what can be moved to working with remote files.

overall, this probably will make a lot of the configuration of our cloud resources easier and more reliable, since rather than point to a local reference, we'll be pointing to URLs

Log messages like

exporting file to /mnt/rawdata/foo

will become things like

exporting file to pacta.blob.core.windows.net/rawdata/foo

and similarly our configs can point to the same.

The tricky part is going to be authentication (as always). I don't know if the simple URLs will work, or if we'll need to put an SAS in there somehow.

jdhoffa commented 8 months ago

So ELIF5: rather than mounting anything in, due to the permissions awkwardness, you want to read data directly from an RO URL? and hopefully on the code side, all that would need to change is the path specification of the root file-storage? + some authentication handling?

AlexAxthelm commented 8 months ago

due to the permissions awkwardness, you want to read data directly from an RO URL? and hopefully on the code side, all that would need to change is the path specification of the root file-storage?

I need to experiment a bit, but in theory, yeah. They could be read/write URLs (including to paths that don't exist yet). So instead of a code block that looks like this:

outputs_dir <- file.path("mnt/", "foo", 2022Q4")
mtcars_file <- file.path(outputs_dir, "mtcars.rds")
saveRDS(mtcars, mtcars_file)

we might have something like this (not tested):

outputs_destination <- file.path("pacta.blob.core.windows.net", "foo", 2022Q4")
mtcars_file <- file.path(outputs_destination, "mtcars.rds")
saveRDS(mtcars, mtcars_file)
  • some authentication handling

In theory, the auth shouldn't need to live in the codebase, but rather in the deployment mechanisms. If we're running code in a container, and assign that container an identity with appropriate permissions (read-only, or read-write), then when anything on that system tries to read or write those files, Azure handles the auth.

jdhoffa commented 8 months ago

Makes sense!