apache / accumulo

Apache Accumulo
https://accumulo.apache.org
Apache License 2.0
1.06k stars 445 forks source link

Compaction property redesign #4273

Open ctubbsii opened 7 months ago

ctubbsii commented 7 months ago

Had a discussion with @keith-turner and @ddanielr about some of the compaction configuration and SPI design, and we came up with some of these ideas that could be implemented:

One major unknown is what to call it. We went back and forth a bit on whether it should be CompactionService* or CompactionPlanner*

  1. Remove user-facing SPI for CompactionPlanner and related config
  2. Replace with CompactionServiceFactory: a. compaction.service.factory holds the class name (singleton ref in ServerContext?) b. compaction.service.factory.config holds a single string of all the factory's config
  3. Suggested API (names can change later, this is just an idea)
    
    interface CompactionServiceFactory {
    void init(PluginEnvironment env);
    CompactionService forName(String service);
    }

// configuration for a bounded named group / queue class GroupConfig { String name; int maxQueueSize; }

interface CompactionService { Set getGroups(); List plan(tabletPlanningInformation); }

4. `init` can validate the config, or re-validate periodically (implementation dependent)

Our default configuration for the `compaction.service.factory.config` could be (not pretty-printed, but can be copied/pasted into the property description in an HTML-friendly pretty-printed way):
```json
[
    {
        "meta": {
            "maxOpenFilesPerJob": "30",
            "groups": [
                {
                    "name": "accumulo_meta_small",
                    "maxSize": "128M",
                    "maxJobs": "1000"
                },
                {
                    "name": "accumulo_meta_large",
                    "maxJobs": "1000"
                }
            ]
        },
        "default": {
            "maxOpenFilesPerJob": "30",
            "groups": [
                {
                    "name": "user_small",
                    "maxSize": "128M",
                    "maxJobs": "1000"
                },
                {
                    "name": "user_large",
                    "maxJobs": "1000"
                }
            ]
        }
    }
]

One of the main benefits of this is having simplified configuration for user's configuration file. Also, we'll be able to set a default value in the DefaultConfiguration that actually lives in one place, and is easy for users to view for reference that represents the actual default behavior and override if they want to.

Re: #3981, #4034 , #4061

keith-turner commented 6 months ago

For naming, could have the following property names.

compaction.services.planner.factory
compaction.services.planner.factory.config

And the following SPI with Planner in the name and changed forName to forService.

interface CompactionPlannerFactory {
   void init(PluginEnvironment env);
   CompactionPlanner forService(String serviceName);
}

interface CompactionPlanner {
  Set<GroupConfig> getGroups();
  List<Job> plan(tabletPlanningInformation);
}
dlmarion commented 6 months ago

Thinking about property validation, something that I just added to the Upgrader in 2.1.3, having properties defined in the SPI classes and not in Property.java makes validation incomplete. It might be good for the service provider interfaces to extend a common base interface that has a method which returns a property validator object. That would let the SPI implementer use whatever properties they like, but we still have a way to validate that the configuration is correct.

dlmarion commented 6 months ago

Had a discussion with @ddanielr about this. Taking into account #3546 I think the default configuration should be simple. As in:

[
  {
    "cs1": {
      "maxOpenFilesPerJob": "30",
      "groups": [
          {
            "name": "default",
          }
      ]
    }
  }
]
dlmarion commented 3 weeks ago

I removed the bug label as this issue does not reflect a defect in the current implementation. I also moved this from from correctness iteration in the Elasticity project to the scaling iteration. These changes are not required for the correct operation.