Closed kakaiu closed 2 months ago
Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; exit 1; fi. Reason: exit status 1
Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; exit 1; fi. Reason: exit status 1
Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; exit 1; fi. Reason: exit status 1
Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; exit 1; fi. Reason: exit status 1
Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; exit 1; fi. Reason: exit status 1
Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; exit 1; fi. Reason: exit status 1
N/A
Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
N/A
N/A
N/A
Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
N/A
N/A
Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
N/A
Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
N/A
Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; exit 1; fi. Reason: exit status 1
Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; exit 1; fi. Reason: exit status 1
Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; exit 1; fi. Reason: exit status 1
Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; exit 1; fi. Reason: exit status 1
N/A
N/A
N/A
Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
N/A
Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
N/A
Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
N/A
N/A
Error while executing command: ctest -j ${NPROC} --no-compress-output -T test --output-on-failure. Reason: exit status 8
N/A
Error while executing command: ctest -j ${NPROC} --no-compress-output -T test --output-on-failure. Reason: exit status 8
Error while executing command: docker build --label "org.foundationdb.version=${FDB_VERSION}" --label "org.foundationdb.build_date=${BUILD_DATE}" --label "org.foundationdb.commit=${COMMIT_SHA}" --progress plain --build-arg FDB_VERSION="${FDB_VERSION}" --build-arg FDB_LIBRARY_VERSIONS="${FDB_VERSION}" --build-arg FDB_WEBSITE="${FDB_WEBSITE}" --tag foundationdb/ycsb:${FDB_VERSION}-${COMMIT_SHA}-debug --file Dockerfile.eks --target ycsb .. Reason: exit status 1
N/A
Error while executing command: ninja -v -C build_output -j ${NPROC} all. Reason: exit status 1
N/A
Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
Error while executing command: ctest -j ${NPROC} --no-compress-output -T test --output-on-failure. Reason: exit status 8
N/A
N/A
Error while executing command: ninja -v -C build_output -j ${NPROC} all. Reason: exit status 1
N/A
Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
Error while executing command: ctest -j ${NPROC} --no-compress-output -T test --output-on-failure. Reason: exit status 8
We want to significantly improve the bulk loading speed by directly injecting data to storage servers instead of going through transaction system. Therefore, the new bulk load mechanism is expected to fully leverage the parallelism of data distribution system. The new bulk loading feature is heavily implemented as a part of data distribution.
Related issue: https://github.com/apple/foundationdb/issues/1002
When defining bulkLoad behavior, our goal is to avoid users making mistakes when using the tool. A bulk load task involves a specified range and a set of files to be loaded into that range. Users must specify the range to load the data from the input files. Any data outside this range will be ignored. Upon completion of the bulk load task, the injected data will become visible to users. Any pre-existing data within the specified range will be discarded. To prevent user mistakes, we halt all traffic within the range during the bulk loading process (this functionality will be included in a separate PR).
User interface design
We aim to develop a user interface that facilitates the reliable submission of bulk load tasks and simplifies the monitoring of their progress. A key focus of the design is to ensure users are fully aware of their actions. To achieve this, we have embedded several design features. For instance, bulk load tasks remain persisted until users acknowledge their completion.
There are four key operations:
In general, bulk load system provides both FDBCLI and transactional way to conduct the above all operations (This PR only includes FDBCLI based user interface for testing purposes. Transaction based interface will be included in a separate PR).
Setting bulk load mode Both FDBCLI and transactional methods submit the task by setting
\xff/bulkLoadMode
key. When the mode is turned on, DD restarts and keeps monitoring bulk load tasks submitted to\xff/bulkLoad/
and DD triggers bulk load tasks accordingly. When the mode is turned off, DD restarts and cancels all data moves issued for the bulk load task.The FDBCLI command to set the bulkLoad mode:
bulkload mode <on|off>
Submit a task Both FDBCLI and transactional methods submit the task by setting
\xff/bulkLoad/
. This key space will be read by DD which triggers task accordingly. It is noted that the input range must be within the user key space (aka."" ~ \xff
).FDBCLI example to trigger a bulkLoad task:
bulkload local 1 2 "/root/sim-load/bulkLoad/1" "/root/sim-load/bulkLoad/1/95af52bf2eddebe59a954db83895fa74-data.sst" "/root/sim-load/bulkLoad/1/51da3c579419558de34f00126efa25a0-bytesample.sst"
bulkload
is the command name;local
indicates to load the file from a local file system which is accessible to any storage server in the (simulated) cluster (for testing purpose, such as simulation and loopback cluster testing);1 2
indicates to load the file to KeyRange [1, 2);/root/sim-load/bulkLoad/1
is the folder dedicated to the files to load;/root/sim-load/bulkLoad/1/95af52bf2eddebe59a954db83895fa74-data.sst
is the data file to load;/root/sim-load/bulkLoad/1/51da3c579419558de34f00126efa25a0-bytesample.sst
is the bytesSample file to load. BytesSample is used for FDB data distribution system to do the load balancing.Get bulk load task status After a bulk load task is triggered, users can use fdbcli or transactional API to get the status of the task. By providing a range, the client will get all bulk load tasks intersecting the range.
FDBCLI example to get a bulkLoad task status to get all bulk loading of the entire user space:
bulkload status "" \xff
Output of the bulk load status:
The output shows that there are three bulk loading tasks overlapping the input range. In the status, it includes settings of tasks, such as file names and loading range. Moreover, the task status reports the progress. A bulkLoad task has 4 major stages: (1) submitted; (2) triggered; (3) start running; (4) running complete. The status records the time for each stage when the stage is end. If a stage is not end, the corresponding time is unset. User can use those time to track the progress of a task. The status also includes information for debugging purpose, such as the ID of the data move that is working on the bulk loading task. If a bulk load task is stuck, users can use this data move ID to check the progress of the data move.
Acknowledge the completion of a bulk load task After a bulk load task is completed, users can use fdbcli or transactional API to mark the task as acknowledged on metadata by providing a task ID and a range. DD erases acknowledged tasks' metadata in background.
FDBCLI example to acknowledge completion of a task:
bulkload acknowledge 7599f0eadaf7df66222cc540b98bf224 1 2
This command will get the task status of range [1, 2) at first and if the task ID matches, the task metadata will be erased.Backend design
We aim to develop a bulk loading backend compatible with various storage engines. Therefore, the backend design must be flexible, straightforward, and aligned with the current data distribution architecture. When a user submits a task, it is stored in the bulkLoad system key space as previously described. The Data Distributor (DD) periodically checks for new tasks. Upon detecting a new task, the DD ensures its completion by initiating data moves. These data moves signal storage servers by setting the ServerKey in the system key space. When a storage server (SS) receives a bulk load task, it downloads or copies the necessary files to a local directory and injects the data into the key-value store. Once the data move is completed, the injected data becomes accessible, and the bulk load task completes.
A bulk load task is defined by:
Running a bulk load task has following constraints:
Life cycle of a bulk load task --- 5 stages:
\xff/bulkLoad/
key space;\xff/bulkLoad/
key space to handle. When DD starts handling this task, DD persisted the task phase as BulkLoadPhase::Triggered;Bulk load tasks are persisted on
\xff/bulkLoad/
in a form of key range map, therefore any range has at most one bulk loading task at a time. Each task has a task id to distinguish different tasks with the same range.Key invariants of updating bulk loading task map:
Key bulk loading mechanisms and invariants:
Simulation tests
Currently, in a certain frequent case, the MachineAttrition workload kills too many storage servers such that the simulation is stuck when finding remote destination teams when the number of data center is 6. To avoid this issue, we set
generateFearless=false
such that the maximum number of data centers is 4. In rare cases, the simulation cannot complete because CC is failed to detecthalted
DD, therefore no DD is running then. This is caused by the network partition between CC and DD. To avoid this issue, the simulation disables the network partition injection.100k correctness test with 1 external timeout for sharded rocksdb engine: 20240723-173043-zhewang-ef27ca81eb64aea1 compressed=True data_size=37277237 duration=5947713 ended=100000 fail=1 fail_fast=10 max_runs=100000 pass=99999 priority=100 remaining=0 runtime=1:37:50 sanity=False started=100000 stopped=20240723-190833 submitted=20240723-173043 timeout=5400 username=zhewang
100K bulkLoading tests: 20240723-173130-zhewang-494457bfa21d04cb compressed=True data_size=37310972 duration=24308992 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=2:24:45 sanity=False started=100000 stopped=20240723-195615 submitted=20240723-173130 timeout=5400 username=zhewang
Feature dependency
For the simplicity and performance, this PR relies on the existing infrastructure of physical shard moves and ShardedRocksDB. So, temporarily, the bulk loading feature is only available with ShardedRocksDB engine and with following features enabled:
However, this PR sets up a general framework of bulk loading for any storage engine. So, in the future, we will support RocksDB and SQLite and range-based data moves with setting shard_encode_location_metadata = true only.
Future works
TODO: (1) Disable user traffic when bulkLoad; (2) Add file checksum; (3) Check whether all data are within the input range when injecting the data to storage server; (4) Bulk load cancellation (Clear bulk load task metadata and trigger a new data move on the same range which will be a normal data move without bulk load task); (5) Allow bulk loading on a readWrite SS shard (optional).
Code-Reviewer Section
The general pull request guidelines can be found here.
Please check each of the following things and check all boxes before accepting a PR.
For Release-Branches
If this PR is made against a release-branch, please also check the following:
release-branch
ormain
if this is the youngest branch)