TraceMachina / nativelink

NativeLink is an open source high-performance build cache and remote execution server, compatible with Bazel, Buck2, Reclient, and other RBE-compatible build systems. It offers drastically faster builds, reduced test flakiness, and specialized hardware.
https://nativelink.com
Apache License 2.0
1.12k stars 102 forks source link

[Discussion] How to Improve the Readability and Composability of the NativeLink Config JSON part 1 #834

Open MarcusSorealheis opened 4 months ago

MarcusSorealheis commented 4 months ago

Currently, one area of confusion stems from the fact that we accept user-defined keys (in other words, dynamic keys) for the names of stores and other entities. It's fine to have user-defined names, but they should be anchored to a non-dynamic key, and added as fields. Here is an example config in our current state:

"AC_MAIN_STORE" and "FS_CONTENT_STORE" are both user-supplied values.

{
  "stores": {
    "AC_MAIN_STORE": {
        "filesystem": {}
    },
    "FS_CONTENT_STORE": {
        "filesystem": {}
    }
  }
}

Below is an improved version of the same config where:

The sum of all these changes results in many improvements.

{
  "stores": [
    {
      "name": "AC_MAIN_STORE",
      "options": {
        "filesystem": {
            "content_path": "/tmp/nativelink/data-worker-test/content_path-ac",
            "temp_path": "/tmp/nativelink/data-worker-test/tmp_path-ac"
            ...
        }
      }
    },
    {
      "name": "FS_CONTENT_STORE",
      "options": {
        "filesystem": {}
      }
    }
  ]
}
aaronmondal commented 4 months ago

I like this. It reminds me of how such configs would look like in YAML.

Some ideas (which effectively can be compiled down to json):

GHA style (seems nice and concise initially, but the more I think about it the less convinced I get as it seems very easy to mess up when configs become more complex):

---
stores:
  - name: ac-main-store-fs
    uses: filesystem-store
    with:
      contentPath: /tmp/nativelink/data-worker-test/content_path-ac
      tempPath: /tmp/nativelink/data-worker-test/tmp_path-ac
  - name: ac-main-store-memory
    uses: memory-store
    with:
      maxSize: 20G
  - name: ac-main-store
    uses: fast-slow-store
    with:
      fast: ac-main-store-memory
      slow: ac-main-store-fs

K8s style (seems overly verbose initially, but the more I think about it the more I like it as it provides a clear, familiar structure and a standardized way to iterate on the API):

---
apiVersion: nativelink/v1alpha1
kind: FilesystemStore
metadata:
  name: ac-main-store-fs
spec:
  contentPath: /tmp/nativelink/data-worker-test/content_path-ac
  tempPath: /tmp/nativelink/data-worker-test/tmp_path-ac
---
apiVersion: nativelink/v1alpha1
kind: MemoryStore
metadata:
  name: ac-main-store-memory
spec:
  maxSize: 20G
---
apiVersion: nativelink/v1alpha1
kind: FastSlowStore
metadata:
  name: ac-main-store
spec:
  fast: ac-main-store-memory
  slow: ac-main-store-fs
MarcusSorealheis commented 4 months ago

I thought about this for a long time and the only possible downside of the new format is that lookup in the vector format (example 2) changes from direct access to O(n) time. However, given that we will never have more than 5 or 10 stores in NativeLink, I don't see this as a material concern. In the worst case scenario, 100 stores would still not cause any negligible performance problems.

allada commented 4 months ago

In this case the Big-O complexity doesn't change. We have to translate these configs into the implementation which requires iterating through them all at startup time. Internally it'll still hold them as a HashMap.

I'm fine with this, the only bikeshed I have is "options". I would like to find a more intuitive name... maybe "config" or something?

blizzardc0der commented 4 months ago

I'd like to work on this issue.

blizzardc0der commented 4 months ago

I've questions while I was working on this issue. I just had felt that schedulers had the same issue as stores and I think it needs to updated as well.

Before:

"schedulers": {
    "MAIN_SCHEDULER": {
      "simple": {
        "supported_platform_properties": {
          "cpu_count": "minimum",
          "memory_kb": "minimum",
          ...
        }
      }
    }
  }

After:

"schedulers": [
    "name": "MAIN_SCHEDULER",
    "config": {
      "simple": {
        "supported_platform_properties": {
          "cpu_count": "minimum",
          "memory_kb": "minimum",
          ...
        }
      }
    }
  ]

Do we need this change in this PR?

cc: @MarcusSorealheis , @aaronmondal , @allada , @adam-singer

blizzardc0der commented 4 months ago

Okay, I already made PR for both stores and schedulers.

https://github.com/TraceMachina/nativelink/pull/889