Currently colossus (storage-node0 imposes artificial constraint on a workers:
Run a single instance for all buckets
All bucket transactorAccountId must be the same account
This is reflected in the configuration/arguments.
Buckets to be handled cannot be specified as an argument.
Only a single account key can be passed
In the implementation, on startup the node checks the bucket obligations and for each bucket checks that the transactor account corresponds to the key provided:
This is severely limiting the flexibility for a worker to run multiple instances of a storage-node. This problem has been raised by the storage lead. The ability to run multiple nodes is needed to better scale operations of storage workers.
Solution
Dealing with multiple accounts
Operator must be able to provide multiple keys, corresponding to the buckets that will be handled by the running instance. I propose this be done by replacing the keyFile argument with keyStorePath who's value is a path to a folder containing multiple files. All of which would be loaded into the keyring of the node.
Bucket selection
Following from previous point, if an instance needs to be "read-only", ie operator does not intend it to handle new uploads, but only serve distributors.. technically a key is not required as it will not need to send any transactions. So the buckets to be operated should not be determined simply by which keys are found in the keystore.
Therefore bucket selection should be explicit. I think it makes sense to have a single argument buckets string where a matching pattern is passed specifying buckets and other options for each. (see point below about uploads restriction)
Currently there is no built-in transaction coordinator if multiple nodes intend to accept new uploads for the same bucket (presuming they have a shared storage volume) or for different buckets configured with the same transactorAccountId. This can lead to potential situation where two nodes will submit a transaction with the same nonce. To avoid this I propose that there should be an explicit configuration that operator chooses if the node will not-accept new uploads for a bucket on that instance (disabling HTTP POST requests for that bucket).
Keeping in mind that the metadata for a bucket currently only allows specifying a single url for both uploads and downloads, if an operator chooses to place multiple nodes behind a load balancer/nginx/caddy reverse proxy, they must have rules that route POST requests only to the instance that allows uploads.
Perhaps it would be useful also to update the message for setting the operator metadata so different URLs for each purpose can be provided. This is a bit disruptive though on the client applications side, but if done in a backward compatible way it could be rolled out smoothly.
Problem
Currently colossus (storage-node0 imposes artificial constraint on a workers:
transactorAccountId
must be the same accountThis is reflected in the configuration/arguments.
In the implementation, on startup the node checks the bucket obligations and for each bucket checks that the transactor account corresponds to the key provided:
https://github.com/Joystream/joystream/blob/cc43a96c15cb16f01cca6c47752444a8863e6fc3/storage-node/src/commands/server.ts#L281
This is severely limiting the flexibility for a worker to run multiple instances of a storage-node. This problem has been raised by the storage lead. The ability to run multiple nodes is needed to better scale operations of storage workers.
Solution
Dealing with multiple accounts
Operator must be able to provide multiple keys, corresponding to the buckets that will be handled by the running instance. I propose this be done by replacing the
keyFile
argument withkeyStorePath
who's value is a path to a folder containing multiple files. All of which would be loaded into the keyring of the node.Bucket selection
Following from previous point, if an instance needs to be "read-only", ie operator does not intend it to handle new uploads, but only serve distributors.. technically a key is not required as it will not need to send any transactions. So the buckets to be operated should not be determined simply by which keys are found in the keystore.
Therefore bucket selection should be explicit. I think it makes sense to have a single argument
buckets
string where a matching pattern is passed specifying buckets and other options for each. (see point below about uploads restriction)Currently there is no built-in transaction coordinator if multiple nodes intend to accept new uploads for the same bucket (presuming they have a shared storage volume) or for different buckets configured with the same transactorAccountId. This can lead to potential situation where two nodes will submit a transaction with the same nonce. To avoid this I propose that there should be an explicit configuration that operator chooses if the node will not-accept new uploads for a bucket on that instance (disabling HTTP POST requests for that bucket).
Keeping in mind that the metadata for a bucket currently only allows specifying a single url for both uploads and downloads, if an operator chooses to place multiple nodes behind a load balancer/nginx/caddy reverse proxy, they must have rules that route POST requests only to the instance that allows uploads.
Perhaps it would be useful also to update the message for setting the operator metadata so different URLs for each purpose can be provided. This is a bit disruptive though on the client applications side, but if done in a backward compatible way it could be rolled out smoothly.