ServiceDescription with single string, remove service/instance/event distinction

eclipse-iceoryx / iceoryx

Eclipse iceoryx™ - true zero-copy inter-process-communication

https://iceoryx.io

Apache License 2.0

1.58k stars 373 forks source link

ServiceDescription with single string, remove service/instance/event distinction #1052

Open elfenpiff opened 2 years ago

elfenpiff commented 2 years ago

Brief feature description

The ServiceDescription should be modified so that it only contains one string instead of 3 and iceoryx should not make the distinction between service/instance/event since it has no meaning in our world.

The ServiceDescription would then only consist of one single cxx::string which would also have the advantage that we could reduce the string length to 1 in an embedded low memory environment and use the underlying char for service enumeration which would be quiet memory efficient. For use cases like ara com or someip a binding could introduce logic to reestablish the service/instance/event logic by introducing for instance a separator and name the underlying iceoryx services like /ServiceName/InstanceName/EventName.

In findService we could then also remove the wildcard option and introduce an option like exactMatch or partialMatch. This would allow a user for instance to look for toad and they would find AllHailHypnotoad, AllGloryToTheHypnotoad with partialMatch. For more fine grained searches the ara com / someip binding could introduce extra logic to filter the results further.

Furthermore it would give other frameworks the advantage to encode additional features cleanly into the service description like for instance a type, a class hash - by just defining some service syntax on their own.

At the moment we have cyclone dds which uses only partially those 3 identifiers in the internal iceoryx integration which makes it more clear that this is not the way to go. Or look at ros2 when it uses dds and how the services are mapped there.

elfenpiff commented 2 years ago

@budrus @FerdinandSpitzschnueffler @MatthiasKillat @elBoberido @mossmaurice @dkroenke What do you think?

If you have additions to the feature descriptions please add them directly in the description.

budrus commented 2 years ago

Maybe a good idea. We have today different mappings for rmw_iceoryx, wirhin Cyclone DDS and some kind of ara::com best practice. The constraint would be that this is all possible again and then we have to adapt these

FerdinandSpitzschnueffler commented 2 years ago

Good idea. I think especially for the partial search, the prefix tree @MatthiasKillat is working on will be very efficient. The question is: when do we want to introduce this change? If we want it for the next release, parts of the refactoring of ServiceDiscovery and ServiceRegistry that is currently ongoing will be obsolete soon.

elBoberido commented 2 years ago

We could also encode the data type into this, e.g. "/my/service/data_type/version" -> "/my/service/int64_t/1". Not quite sure if it should be in this string or if there should be a second string for the data type description e.g.

struct ServiceDescription {
  string name;
  string typeId;
};

And it would be used like ServiceDescription("/my/service", "/int64_t/1") or ServiceDescription("/my/service", ""). In the latter example we could print a warning that no type check can be performed.

elfenpiff commented 2 years ago

@elBoberido I would really love to keep everything in one string to make it as simple as possible. Otherwise we have to write a little parser to translate all the types and the question is where are they stored and how to perform a type check? How do we get from a user defined type to the string of the user defined type? What if there are multiple user defined types with the same string?

I would also would love to have this as soon as possible - at best for 2.0. But then we make us unnecessary pressure - something I hate. But what about a compromise?

We introduce a constructor in the service description with a single string and this string is stored in m_service
We implement the findService feature only for the single string service. Since it is a new feature we can do whatever we want - and this would be much nicer then to implement the feature and brake the API in the next release.

This would leave the old API intact, we do not have to change anything but would gain access to the new feature. If everything plays out nice and fast we could remove the old API and if not we leave the old API in 2.0 and remove it later. But all new features will only work with the single string service description so when we remove the old API we do not have to refactor much.

And maybe the findService feature is much easier to implement with a single string service instead of having 3 ones? @mossmaurice @FerdinandSpitzschnueffler @MatthiasKillat

MatthiasKillat commented 2 years ago

I have not thought much of it, but a few points.

I do not think we can quickly change it without breaking dependencies in e.g. CycloneDDS, ROS 2 rmw_iceoryx etc.
I do not think it is very beneficial if we basically end up encoding the partial identifiers separated by / or similar. It makes searching even more of a hassle (details below).
I would really avoid encoding type information or any other additional information into this string. From my point of view we need a ServiceIdentifier (which currently are those 3 strings, could be one, could be a unique number etc. but needs to be mappable easily to Autosar and others) and a ServiceDescription. The latter contains additional information about the service in some form yet to be defined (such as type information). It could even contain optional custom information. There is a one to one correspondence between the two. If one tries to encode everything into a string it just gets problematic to extract the necessary information.

Addendum: if ServiceDescription is basically only the type information multiple ServiceIndentifiers could have the same type information. If it contains the identifier as well (composition), it is unique.

elBoberido commented 2 years ago

@elfenpiff well, it depends what you want to communicate with that string. If it is only one string, you have to encode the service semantic type and the service data type into one string. In DDS/ROS for example this is not done and there is a topic name and a separate type name, e.g. ros_discovery_info and rmw_dds_common::msg::dds_::ParticipantEntitiesInfo_. MQTT on the other hand somewhat encodes this into one string, e.g. hypnotoad/1984/audio/ogg and hypnotoad/1984/audio/mp3.

The check would be a strict comparison of the strings and it is the responsibility of the user to add sane strings or just use an empty string and skip this check. For safety, we would require to have this and a tool could generate this types from an IDL. We could even have a template toTypeName which returns an empty string in it's default implementation and can be specialized by the user. We could also automatically derive this if the type has some constexpr defined, e.g. VERSION and TYPE_NAME. After writing all this stuff, I guess typeId could automatically be inferred for the typed API and we could have a cxx::variant<None, AutomaticTypeDeduction, uint64_t, cxx::string> as second parameter for the ServiceDescription with a default value of None.

cc @budrus @MatthiasKillat @FerdinandSpitzschnueffler @mossmaurice @dkroenke

MatthiasKillat commented 2 years ago

As for findService with one string. Depending on what should be searched for (I am considering the Autosar use case), this is not a good idea because it requires searching for the separator (e.g. /) in general if we e.g. want to find all instances or all events of a specific instance etc. This degenerates into almost brute force searches even with a prefix tree (a map would do worse here).

While prefix searches are fast and easy though (e.g. find all instances of a specific service with the proposed encoding), general substring searches are not. Those require suffix trees or other elaborate structures which require quite a lot of storage to allow the fast lookup. Brute force is not an option for this, even for a small number of entries (say 100-300 events) I would argue.

In my opinion this will not help us in the short-term and just create more problems to be solved quickly, more extensive refactoring etc. It can be done, but I think we should not change it right now before 2.0.

MatthiasKillat commented 2 years ago

@elBoberido You are right that comparison for equality is easy with one string (necessarily linear in the encoding size), it is too restrictive for the user I think. This is why I would like the split of the ServiceIdentifier and ServiceDescription in the long term (but not now for 2.0). Of course the user is free to encode anything she wants in the string (up to the available size), but I would not consider this the intended way to use the string for the user.

And if you start with partial searches in such a "can contain everything" string it just gets out of hand.

Note that with the 3 existing strings the user of iceoryx already has much of these possibilities, she can just leave unused identifiers empty. I just do not see the benefit without considering the usage implications more carefully.

As for equality comparison: it makes little difference whether we check 3 small strings vs. 1 larger string of approximately the combined size or similar.

For inexact searches it gets more problematic I would say, and that is the major downside. Prefix searches are in principle alright, but I would not commit to this until the prefix tree is somewhat proven to work properly. I also still have some issues with the memory consumption of storing this search information efficiently.

A vector of strings would be feasible as a fallback if we do not want to perform inexact searches up to say 100-500 elements (maybe, just a guess).

elBoberido commented 2 years ago

@MatthiasKillat You are right that for some search scenarios it will be faster if we have three separate strings but for memory consumption it might be beneficial to have only one. Just a wild guess, I have no data to back this up. Partial searches can also be optimized if we restrict the search tokens, e.g.

start with the first / and then match the MyService
start with the second / and the match MyInstance So something like pre-filtering with a reduced filter set followed by a more sophisticated search or a two pass search. We could have an API like this

uint64_t findService(uint64_t advanceToSeparatorWithNumber, string searchTerm, std::function(void(string))& f);
uint64_t findService(string startsWith, string searchTerm, std::function(void(string))& f);

Someone using iceoryx for an ara::com implementation could then wrap it into something like this

constexpr uint64_t SERVICE {1};
constexpr uint64_t INSTANCE {2};
constexpr uint64_t EVENT {3};
std::vector araComWrapper(whatever) {
    std::vector<> ret;
    findService(SERVICE, "Radar", [&](auto&){ ... ret.push_back(...); ...})  // finds services "Radar" and "RadarFoo";  
    findService(SERVICE, "Radar/", ...)  // finds only service "Radar" but not "RadarFoo";  
    or
    findService(INSTANCE, "Right/", ...) // finds instance "Right" of services  "Radar" and "Camera"
    or
    findService("/Radar/Right/", "*", ...)  // finds all events for service "Radar" and instance "Right"
}

With this restrictions on the pre-filtering, it could still be done quite efficiently.

Oh, as it looks like, zenoh also has only one string for this but without encoded information for the data type. Therefore I would lean towards having one string as ServiceIdentifier and an optional second one for the data type, which could automatically be deduced by some template magic.

cc @budrus @elfenpiff @FerdinandSpitzschnueffler @mossmaurice @dkroenke

MatthiasKillat commented 2 years ago

@elBoberido I am not against one string as the identifier mid-term (but would like a separate identifier for the type and additional info, whatever this may be).

Just for 2.0 it I think it will lead to problems and break things we have yet to consider (and surely CycloneDDS, ROS, rmw_iceoryx). It will also require changes in C-API, examples, tests and so on.

Regarding the search: The interface looks alright, but it still does not answer the question how to find all matches to */MyInstance/* efficiently, where the * is a wildcard. If one knows the number of separators one can use this info in the indexing scheme internally (so basically as planned, as this is equivalent to knowing the number of substring between separators). But as I understand we do not have this info as the user is free to use separators as she sees fit and then prefixtree index at least breaks (as does any indexing structure working on complete keys, such as maps).

What if the user for some reason decides to define patterns like u/v/x/y/z and wants to find everything matching */v/*/y/*. Sure, one can always have brute force but then the question is: how many entries are we talking? I was always operating on the assumption of the order 10³. There it starts to get painful without proper indexing (which is the purpose of prefix trees, suffix trees, maps and so on). I will think about how to solve this problem below for an arbitrary number separators.

MatthiasKillat commented 2 years ago

Before we do any of this, we should understand the search scenarios, order of entries to manage and the full implications of an API change like this in more detail. I am less concerned by the search logic, something is possible there (but I would like to test my current approach ...). I am much more concerned by all the changes at the user side (ports and so on).

Maybe one can have a phase where both versions co-exist, i.e. additional constructors for the ports with just one string etc.

elBoberido commented 2 years ago

@MatthiasKillat I'm also for two strings, one ServiceIdentifier and one ServiceDataType for example.

Regarding the search. Your example with */y/*/y/* would not be supported. There are only two options for pre-filtering

the ServiceIdentifier must start with the provided filter string
the number of separators which need to be skipped
in the future this could be extended with more complex filtering but I wouldn't support more from the beginning ... yes, I talked about two options but you also know there are two hard problems in computer science
- cache coherency
- naming things
- off by one errors

The search would then only be applied to the characters following the pre-filter, e.g. for /Radar/Right/Whatever

with start string filtering of /Radar/Right/, the search would be applied to Whatever
with number of separators to be skipped set to 2, the search would also be applied to Whatever

While the user would be free to decide how many separators to use, it would also be up to the user to follow her schema when doing the query. Let's assume iceoryx is used for an ara::com implementation, then the schema would be /service/instance/event, therefore your example with */MyInstance/* would map to

uint_64_t INSTANCE {2}; 
findService(INSTANCE, "MyInstance/", [](){...})`;
// or depending on how matching behaves, with a * at the end
findService(INSTANCE, "MyInstance/*", [](){...})`;

To integrate this into iceoryx, one could either follow the suggestion of @elfenpiff or do the following (not sure which one would be faster/less invasiv):

create a ServiceIndentifier
create a conversion operator from ServiceDescription to ServiceIdentifier
make the Publisher/Subscriber/etc. use the ServiceIdentifier
deprecate the ServiceDescription via doxygen for iceoryx v2 and via [[deprecated]] attribute for iceoryx v3

With this approach we would already provide the new API with iceoryx v2 and still be source compatible with iceoryx v1. The ServiceDiscovery would already work with the new schema.

I cannot tell if this is feasible for iceoryx v2 or if we should target v3 if we want this at all.

elBoberido commented 2 years ago

@MatthiasKillat okay, I now understand your concern with the separator filter option. You are right, this one would be a brute force search since all entries must be queried for the separator and the full string must be searched since one never knows if it's at the end of the string.

I don't have a good solution for this, to only idea that comes to my mind is to have a lookup substructure in the PrefixTree with the indices of all separators which gives you a fast forward in the PrefixTree. Don't know if this is feasible and it adds some functionality to the PrefixTree which probably does not belong there.

MatthiasKillat commented 2 years ago

@elBoberido would work with some kind of partial compression at the separators in the tree, which means separators are of a specific level are shared between entries. Then you need a fast forward to separator #x (easy) and could use it as "root" for the search on #substring. I need a whiteboard to sketch this...

Pretty specific though and assumes existence of these separators. I will think about this but first complete what I already have. While I like elaborate solutions (if the problem justifies them), we should not go overboard at the moment and finish it at least as planned.