EricssonResearch / calvin-base

Calvin is an application environment that lets things talk to things, among other things.
Apache License 2.0
282 stars 91 forks source link

Details about Storage class #83

Closed brunodonassolo closed 7 years ago

brunodonassolo commented 7 years ago

Hi,

I'm studying the code of calvin and I would like more information about some internal details. The general idea is extend the calvin's deployment algorithm to consider also dynamic parameters, as CPU, memory or network. I believe you have already discussed this deployment problem, so if you have any thoughts about the subject.

Some questions about the storage mechanism:

In the storage helper functions (calvin/runtime/north/storage.py), the Storage class has 2 data-structures and 2 main functions to edit a value in the storage. Datastructs:

Q1) Why do we need the 2 structures to save the data? I believe there is a raison but I'm missing it.

Q2) It seems to me that the localstore_sets doesn't save all the data. Example, if I call the append(prefix+key="node-xxx", value={"test" : 123}. It will add to localstore_sets[node-xxx][+] = set("test") and not the "123". Is it correct? The question is related to the first one, it is not clear the objective of the 2 structs...

Methods:

Q3) How can I just update one of the values inside the key? For example, in the key: node-xxxx : { "attributes" : ..., "uris": ["calvinip://127.0.0.1:5000"], ... }. If I want just to change the "uris" to a new value, should I use the set method and write all the node-xxxx key again?

For the instant and testing purpose, I'm using the set method and writing a full structure each time.

Best regards,

Bruno Donassolo R&D Engineer at Orange Labs

HaraldGustafsson commented 7 years ago

Yes we have thought about dynamic/quality attributes, like small/medium/large memory etc, I will write that in a wiki page, i'll get back with a reference. Dynamic has also the issue of when to check or if you should get notification when changed.

The storage have a local cache of the key-value-pairs which is stored in the localstore and localstore_sets. When using the local storage type instead of DHT the cache keeps all the data.

A1) We can store 2 different types of value for a key, either a single value (using set) (which could be a JSON-string with multiple values) or a set that values can be added and removed to/from (each value could be a JSON-string but typically it is a UUID-string). The localstore save the first type of key-value and the localstore_sets stores the second type. The localstore_sets contains an + (append) and a - (remove) set since its main purpose is to be a cache for operations towards the external store e.g. the DHT. So when an append operation has been finalized in the DHT the appended value is removed from the + set in localstore_sets for the given key and similare for remove.

A2) The append and remove functions expect a list or a set of values and applies a set function call on the values, hence when you supply a dictionary it will take the keys of that dictionary. This is to be able to supply multiple values in one operation.

A3) The node-xx you refer to is of the first type hence you need to alter the complete JSON-coded value. What you should do if you intend to update certain values separate/often is to break that out into its own key. For example node-cpu-xx, node-memory-xx, etc. If you do that make a update_node_load or similare in storage.py and make sure that delete_node deletes that data. BUT if you intend to use this load information for actor placement I would suggest another approach, which utilize the prefix-searchable index that we have, see the upcoming wiki-page.

brunodonassolo commented 7 years ago

Hi,

Thanks for the response. It's clearer for me now.

I'll wait the wiki page to give a look.

A1) Ok, I believe I understood the difference between the 2 groups. I was expecting the same kind of value in the 2 structs, so my confusion.

A2) It makes sense. I was using the wrong data in the append method.

A3) I made that by now (create a node-memory-xxx). In fact, I want to use this information for the placement. I saw the prefix-searchable index but I didn't know how to use it when searching for range of values (e.g. memory greater than 1000). By now, the idea was get all possibles nodes (like all_nodes.py) and filter them later, but I'll wait to see the suggestion in the wiki page.

Best,

HaraldGustafsson commented 7 years ago

I've added a wiki-page https://github.com/EricssonResearch/calvin-base/wiki/Storage-or-Registry .

Hope this helps. If you need more help on designing how to do requirement matching, please ask. For example if you are doing requirement matching by filtering we should make sure that those requirements are of a different type so that they are applied last after other requirements, etc.

brunodonassolo commented 7 years ago

Hi Harald,

First of all, thanks a lot for the wiki page. It really helped to understand the storage and the index searches for range of values.

I'm writing this message to tell you which are my ideas to implement an initial support for dynamic parameters in the deployment.

1) Monitoring: The idea is add commands in the Calvin Control API to update the values. Example: POST /monitor/cpu/core/node-id { "value": 4 } to set 4 cores in the node.

I believe it is the most flexible way to update the values in Calvin. Mainly because it becomes more difficult to do it inside calvin for more complex parameters like network bandwidth or latency.

I also tried to use the psUtil library to monitor the resources inside csruntime. However, it depends on the host OS to return correct values. For linux, it reads the values from /proc/ what leads to problems when running calvin inside a containers/docker. It returns the resources of the host system, not from the container.

2) Storage and description Following the approach described in the "Future registry expansion" section, I intend to add ranges for the values being monitored (which parameters and which ranges are still to be defined).

For example, in the case of number of CPU cores, we would have in the storage: /index/cpu/cores/1/2/4/8/16/32 The search could be like this: "index": ["cpu", {"cores": 4}], to get all nodes with at least 4 cores.

I first implemented an exact search saving only in the database the number of cores: /cpuCores-ID/4. And I created a new attribute node_resource_min that recovers all nodes that have at least the specified parameter. However, to get all the nodes it was necessary to do (1 + (number of runtimes)) requests to the storage. The first to recovery all nodes and 1 for each node to get the value and compare with the requested one.

Following the proposed approach, I can retrieve all nodes with only 1 requests. It seems to use more memory but has better performance.

3) Dynamic information and deployment changes In the first moment, I will not consider this issue. The idea is consider the new parameters for the first deployment only. But I believe it is doable, either by changing the storage system or by adding something in the input functions of the monitoring.

HaraldGustafsson commented 7 years ago

You say dynamic parameters, are you intending to have dynamic number of cores? Anyway be aware that the runtime is single threaded. You are supposed to have one runtime per core.

For parameters that are stable, like number of cores, max CPU rate, max memory, max network bandwidth. I would prefer if you used attributes, these could then be supplied when the runtime starts, and the call to csruntime could easily be wrapped with a tool to derive all values and then supplied to csruntime command.

For parmeters that change during the lifetime of the runtime it looks ok. We already have monitor in the runtime for actors, so to not confuse please use performance capability or resource e.g. POST /node/resource Which could take a larger dictionary with several values that you likely want to update simultaneously. These would then get a storage index-key like: index-/node/resource/cpuload/X/X etc by using new methods in attribute_resolver.py. You have node-id in the URL is that since you could tell any runtime to update the registry also for others?

In general looks interesting, could take a look at it in more detail when you make it public.

brunodonassolo commented 7 years ago

You say dynamic parameters, are you intending to have dynamic number of cores? Anyway be aware that the runtime is single threaded. You are supposed to have one runtime per core.

No, just a bad example =). I agree that it is not a good parameter to consider in the deployment at all given the single-threaded environment.

For parameters that are stable, like number of cores, max CPU rate, max memory, max network bandwidth. I would prefer if you used attributes, these could then be supplied when the runtime starts, and the call to csruntime could easily be wrapped with a tool to derive all values and then supplied to csruntime command.

Ok, I will consider that. I'm just not sure about runtimes running in a container environment. Maybe some of the static values are not so static.

You have node-id in the URL is that since you could tell any runtime to update the registry also for others?

Yes, I left it. I don't think it is really useful, but I don't see a reason to remove it by now.

In general looks interesting, could take a look at it in more detail when you make it public.

Great. I'm developing in my fork of calvin (https://github.com/brunodonassolo/calvin-base/tree/resourceMonitor). As soon I believe I have at least a simple parameter working I'll submit a patch.

I'm struggling a little with the DHT in my local tests. Frequently I got errors messages when updating the values, like "1400-calvin.calvin.runtime.north.storage: Failed to update index-/cpu/avail" and "There are no known neighbors to set".

I'm still investigating, but did these messages say something to you?

Best,

brunodonassolo commented 7 years ago

I believe we can close this issue by now.

I'll open new ones if necessary to discuss another points about the network model in the deployment.

Thanks a lot for the help @HaraldGustafsson.