eBay / HomeObject

Replicated BLOB Store built upon HomeStore
Apache License 2.0
6 stars 15 forks source link

Multi storage type support -- HS provide optimal placement parameters. #110

Open xiaoxichen opened 1 year ago

xiaoxichen commented 1 year ago

HS supports multiple device type by design however yet till now we dont have a proper way for user to configure it. Take HO as an example

        HomeStore::instance()->format_and_start({
            {HS_SERVICE::META, hs_format_params{.size_pct = 5.0}},
            {HS_SERVICE::LOG_REPLICATED, hs_format_params{.size_pct = 10.0}},
            {HS_SERVICE::LOG_LOCAL, hs_format_params{.size_pct = 0.1}}, // TODO: Remove this after HS disables LOG_LOCAL
            {HS_SERVICE::REPLICATION,
             hs_format_params{.size_pct = 79.0,
                              .num_chunks = 65000,
                              .block_size = 1024,
                              .alloc_type = blk_allocator_type_t::append,
                              .chunk_sel_type = chunk_selector_type_t::CUSTOM}},
            {HS_SERVICE::INDEX, hs_format_params{.size_pct = 5.0}},
        });

This is trying to hide the device detail from upper layer and it works fine with single type of media. When it comes to multi-drive(i.e fast and data). Assuming we have 4*16T drives for Data and 1TB SSD/nvme per HS

  1. the ratio between fast/data is unknown for upper layer that will result META/LOG/INDEX oversize when they multiple by 5% of total size ( data+ssd). In the example this config will result INDEX to be 2.88T and meta be 2.88TB as well, which exceed the SSD size and unnecessary.

  2. Data can only take up (100% - all other service) as in above example data only takes 79%, which limited it to 416TB 79% =50TB, 14TB wasted.

We still want to hide the device detail to upstream but need to make the allocation right, previous discussion led to the path to let HS provide HomeStore::Placement HomeStore::optimal_placement() , as HS knows best regarding the drives it managed, and the characteristic of each service best. While leaving the possibility for upper layer to overwrite.

Feature wise, it is discussible that if we need size as addition to pct. For example for 1TB drive it may (well , already way too big) makes sense for META take 50GB however it seems doesnt make sense when drive grow to 16TB(P5G8) and META takes 800GB.

Test cases:

  1. single SSD This cases more like a regression test, the only consideration point is the absolute size of some component.

  2. single HDD This case is to verify if no SSD(Fast) provided, we can still generate proper configuration. Expecting Meta/Log are capped by size instead of pct, considering the huge capacity.

  3. SSD+ HDD This is the new use case, expecting meta/log/index are all on SSD and data takes 100% HDD.