golemcloud / golem

Golem is an open source durable computing platform that makes it easy to build and deploy highly reliable distributed systems.
https://learn.golem.cloud/
Apache License 2.0
530 stars 59 forks source link

Initial file system #1043

Closed MartinKavik closed 2 weeks ago

MartinKavik commented 3 weeks ago

/claim #1004

Closes #1004, Fixes #843

Initial file system

Specification: https://github.com/golemcloud/golem/issues/1004#issuecomment-2412522032

Currently implemented flow

NOTE: You can run the described flow by yourself with this testing example: https://github.com/MartinKavik/golem_async

  1. golem.yaml files are collected from the project and validated. Then component files are downloaded and packed into a zip archive and uploaded to Golem together with the component data from Golem CLI.

add_component_with_files

  1. Uploaded files are extracted to a new directory inside the worker executor data folder. Each durable worker has one such directory representing its root directory. The code running inside the worker has access to all files in the directory and all files created by the worker are stored there as well.

call_workers

  1. Worker root dir contents can be listed through REST API.

files_api_result

  1. Worker files are deleted from executor together with the worker itself.

Files compression & transport

Files are compressed with the default / standard method Deflated and packed to a Zip archive. Permissions read-only or read-write are linked to every file as Unix permissions. If the file is larger than 4GB, then Zip readers need ZIP64 support to open the archive. The attribute modification time is automatically set to the current time without a time zone. Comments are not used. Related code:

let zip_file_options = zip::write::SimpleFileOptions::default()
    .compression_method(zip::CompressionMethod::Deflated)
    .large_file(source_file_size >= u32::MAX as u64)
    .unix_permissions(permissions.to_unix_file_permissions());
// ...
impl InitialFilePermissions {
    // https://chmod-calculator.com/
    const READ_ONLY_NUMERIC: u32 = 0o444; // r--r--r--
    const READ_WRITE_NUMERIC: u32 = 0o666; // rw-rw-rw-

    pub fn to_unix_file_permissions(&self) -> u32 {
        match self {
            Self::ReadOnly => Self::READ_ONLY_NUMERIC,
            Self::ReadWrite => Self::READ_WRITE_NUMERIC,
        }
    }
   // ...

I've chosen Deflated method and Zip archive because they are the most used ones so we can read or even create such file archives on different platforms and the related tools and libraries should be reliable enough. Zip doesn't support compression across files but it's a tradeoff for potential parallel packing and simple partial extraction that could be useful for more efficient worker files update.

If needed, we can optimize packing by choosing a better method like Zstd because it should be both faster and produce smaller files. Or we can pack files into a compressed Tar archive or into a second inner Zip archive without any compression to achieve compression across files. It would depend on user files - for example compression across files would be suitable for many text files but not too much for already compressed images and videos.

I've decided to include permissions directly in the archive to make the archive work as a self-contained unit. This way:

I tried to use the crate async_zip but created archives were unreadable (as surprisingly described in their docs). Then I tried to use the sync zip crate directly together with Tokio's spawn_blocking but the code became unreliable when an early return has been triggered by an error. So I decided to "isolate" sync Zip calls with channels in golem-cli/src/async_zip_writer.rs.

OpenAPI path parameters

OpenAPI currently (v3) doesn't support optional path parameters. See e.g. swagger.io/docs/specification/v3_0/describing-parameters/:

"... Also remember to add required: true, because path parameters are always required. ...

It means that urls like:

/v1/components/{component_id}/workers/{worker_name}/files/<path>

aren't officially possible to make even though the generator allows it. It should be somehow possible in OpenAPI v4, see https://github.com/OAI/OpenAPI-Specification/issues/93#issuecomment-1908673473.

I've started to implement .../workers/{worker_name}/files/<path> endpoint with url /workers/{worker_name}/files?path=/my_file.txt as the OAS-compatible alternative. But it has some benefits - you don't need to urlencode it if you are confident that the path doesn't contain chars = and & (and maybe others?). And in the future we can add array support when a user wants to define more paths to download more files at once. But I don't know the "business reason" for this API / how real users want to download the files so it depends.

Nested preopened dirs / files, Copy files to workers

Consider this Golem.yaml snippet:

files:
  - sourcePath: https://picsum.photos/200/300
    targetPath: /images/random_picture.jpg
    permissions: read-write
  - sourcePath: ./images
    targetPath: /images
  1. It mixes read-write and read-only files in one directory. Should it be possible? If I'm not mistaken, Wasi / Wasmtime doesn't support it, you can only preopen directories.
  2. Worker wants to delete all its read-write files and folders (to prepare for a big component files update; to remove all user data if the worker represents a user that wants to be forgotten; etc.) You cannot just call a function remove_dir_all because it would probably fail if there is a nested read-only dir entry. As a worker developer I would probably just try to get permissions from dir entry metadata during walking from the root dir and pray that standard file permissions matches the one set by Wasm engine. Then also host file permissions and guest/standard file permissions may become a bit confusing.
  3. If I understand it correctly: All read-write files should be copied to the new worker. What if I don't want it as a worker developer? Let's say I have a component User. It represents all user data and works as a server session as well (see e.g. turso.tech/multi-tenancy or plane.dev/concepts/session-backends). I want to start with almost empty worker state and then lazily attach/copy initial files and databases when the user/customer sign up to my app. I don't want to do that for every web visitor or bot. One idea would be to introduce special directories in the worker root like /static, /public, /shared and /private, where current read-only files and dirs will be inside /static; current read-write files inside /private. /public could a special folder accessible from outside world. /shared could be for passing/sharing big files among workers. Current permissions in the manifest would define classic file permissions. Then we can make /static / read-only files public cheaply just by creating a symlink inside /public.

Windows support

I've tried to work on Golem on Windows first - I wasn't really successful but I've made it at least compilable by these two lines:

// golem-worker-executor-base/build.rs
r#"...path: r"{golem_wit_path}/wit",           // notice `r`
// golem-worker-executor-base/tests/api.rs
use system_interface::fs::FileIoExt;    // replacing `use std::os::unix::fs::FileExt`

Then cargo make run script would need to be rewritten to make it cross-platform. Redis doesn't support Windows but I can imagine Redis + Postgre + Nginx will be replaced / alternative will be introduced in the future to make the entire architecture and development / deployment simpler. I haven't found a reasonable replacement for lnav yet. I've tried to compile it with MSYS2 but I've found out there are too much incompatibilities to make it work on Windows in a reasonable time.

Next steps

I've opened some topics to discuss in that wall of text above and the PR is not mergeable yet. I can add comments directly to PR code to make the review easier or continue implementation if you want to.

I would finish that ..{component_id}/workers/{worker_name}/files/ endpoint and then update/fix other parts of code according to feedback. Also I think this PR is already pretty big so I would recommend to split it and create another PR for the API Definition API and potential extra work.

algora-pbc[bot] commented 3 weeks ago

💵 To receive payouts, sign up on Algora, link your Github account and connect with Stripe.

jdegoes commented 3 weeks ago

@MartinKavik Congratulations on submitting your solution for our Bounty-to-Hire challenge (#1004)!

We have reviewed your pull request and believe that it is very promising. While not yet satisfying all of the requirements of #1004, we believe there is a short and well-defined path to get there, and so we are delighted to announce that you are a finalist in our Bounty-to-Hire event!

Next steps:

  1. You are guaranteed an interview based on your work thus far.
  2. Should you continue to work on this pull request, and get it into a state that can be merged to satisfy all requirements, you stand a reasonable chance of winning the bounty.

If you no longer wish to work on this pull request, please close it, and we will reach out about the interview. On the other hand, if you wish to continue pushing this pull request forward, then please let us know here and continue your work.

Congratulations again on being a finalist for Bounty-to-Hire!

MartinKavik commented 3 weeks ago

@jdegoes Nice! Let's go!

MartinKavik commented 2 weeks ago

Update:

list_and_download_files

API endpoint /v1/components/{component_id}/workers/{worker_name}/files return file listing or the file itself. The query parameter path set non-root path (instead of an optional path part to make it valid according to the OpenAPI specs). The extra query parameter recursive=true/false shows entries from sub-folders as well. The parameter was useful for debugging and it could be very useful together with a query parameter like zip to allow users download all files.


Ideal worker file flow in the system:

  1. CLI downloads and packs files according to a Yaml manifest or from a chosen folder (not implemented or specified).
  2. The file archive is uploaded by CLI and stored in a blob storage together with associated new component.
  3. When a new worker is created, it tries to find extracted files on the executor's file system according to the component id and component version.
  4. If it doesn't find the files, the archive is downloaded from blob storage and extracted to a folder shared by all related workers. File permissions are preserved to identify read-only and read-write files.
  5. Read-write files are copied into worker's root directory. A symlink is created inside worker's root dir for every read-only file. Each symlink points to its worker file through an extra symlink. This extra symlink points to the extracted files currently associated with the worker by component id and version - ex.: symlink current/files/my_file.txt points to data/worker_files/component/{component_uuid}/version/{component_version}/files/my_file.txt . It allows us to atomically update all symlinks when the worker is updated to the new component version.
  6. Extracted files on worker executors are used also for file listing and downloading. Symlinks should be valid both inside and outside the worker so we can list and download them without using internal worker/engine path resolver.
  7. Extracted files are deleted from the worker executor when no worker is associated with them (workers have been upgraded to use a newer component version or deleted).

The flow leverages native file system APIs as much as possible to lower complexity caused by managing extra permission lists, file manifests or database records. The flow is almost fully implemented, but the blocker are symlink permissions and Wasmtime/Wasi. I wasn't able make symlinks work inside the worker because of "not permitted" error even though they were relative, not crossing sandbox borders and pointing to a preopened folder. I would need to investigate it more. In the meantime, the code relies on preloaded paths which is not ideal for various reasons.


Notes: 1) OpenAPI/client generator doesn't produce valid OpenAPI specs (only one response of two is written there) and Rust grpc code when there are two ApiResponses with the same status (200) or two different 2xxs because it probably treats the second one as an error. 2) I tried to use a different linker, Cranelift or write a custom Makefile run task but it's still pretty painful to run the project on local machine because it often consumes ~35 GB RAM even on a small code change before it freezes my Linux or kill IDE. Especially building tests takes a lot of time and resources - is there a way to mitigate it or what is recommended development workflow here? 3) The code is not fully complete yet - see the text about symlinks above and API definition API are missing - but the remaining code is ready for review.


Project for manual testing: https://github.com/MartinKavik/golem_async

vigoo commented 2 weeks ago

painful to run the project on local machine because it often consumes ~35 GB RAM even on a small code change before it freezes my Linux or kill IDE

Are you talking about working on the golem workspace itself? We don't experience this (I'm working on it on both Linux and OSX, usually with IntelliJ, sometimes with Zed).

jdegoes commented 2 weeks ago

@MartinKavik

We have reached November 4th, which is the day we must announce the winner of the Bounty-to-Hire program!

Given the scope of the challenge, it's incredibly impressive that we have had not one, but THREE finalists, who each demonstrated high competence and skill in tackling a problem that would stump most engineers on the planet.

However, ultimately, there can be only ONE WINNER to the Bounty-to-Hire challenge. So after MUCH analysis by Golem engineers, thoughtful consideration, and spirited discussion, we have decided to pick a winner.

That winner will be announced on our LIVE STREAM event TODAY at 12 NOON ET:

https://www.youtube.com/watch?v=at8EPqLWIRE

Without question, all three finalists are highly qualified Rust engineers and we look forward to interviewing and getting to each of them, with the intention of making at least one job offer to fill our recently opened position.

While you await the announcement of the WINNER, please shoot an email to john@ziverge.com to schedule your interview!

Stay tuned for more.

jdegoes commented 2 weeks ago

@MartinKavik If you plan to join for the Live stream today, please do send me a message on Discord (you can find me on the Golem Discord), or an email at john@ziverge.com. Thank you!

jdegoes commented 2 weeks ago

@MartinKavik Thanks for your work on this feature! We remain excited to interview you, should you be interested in doing this sort of work full time as part of the Golem Cloud engineering team.

Since announcing that Maxim Schuwalow has won this bounty, we are now closing this pull request, and look forward to posting many more bounties on smaller-sized issues in the near future.