eclipse / paho.mqtt.rust

paho.mqtt.rust
Other
511 stars 102 forks source link

TopicMatcher does not work as expected #226

Closed Altair-Bueno closed 2 months ago

Altair-Bueno commented 3 months ago

Summary

TopicMatcher behaviour is nowhere near the advertised behaviour, and it differs from the Python implementation

Example

from paho.mqtt.matcher import MQTTMatcher

matcher = MQTTMatcher()

matcher["$MAC/+/+/rpc"] = "_/device_type/systemid/_"
matcher["$MAC/+/+/+/rpc"] = "_/device_type/systemid/zoneid/_"
matcher["$MAC/+/rpc"] = "_/device_type/_"
matcher["$MAC/rpc"] = ""

print(list(matcher.iter_match("$MAC/humidifier/1/rpc")))
# ['_/device_type/systemid/_']
use paho_mqtt::topic_matcher::TopicMatcher;

fn main() {
    let mut topic_matcher = TopicMatcher::new();
    topic_matcher.insert("$MAC/+/+/rpc", "_/device_type/systemid/_");
    // Comment all inserts below to see the difference
    topic_matcher.insert("$MAC/+/+/+/rpc", "_/device_type/systemid/zoneid/_");
    topic_matcher.insert("$MAC/+/rpc", "_/device_type/_");
    topic_matcher.insert("$MAC/rpc", "");

    let matches: Vec<_> = topic_matcher.matches("$MAC/humidifier/1/rpc").collect();
    println!("{:?}", matches);
    // Prints []
    // Only works if it is the only insertion
}

MRE

topic-matcher-behaviour.zip

Altair-Bueno commented 3 months ago

Looking at the current implementation of TopicMatcher, it seems to me it uses too many resources. There are multiple string clones which is not ideal, at least for my usecase (memory constrained device).

I decided to use a far simpler solution that is both correct and should use less memory. It should be somewhat performant too, at least with a few topics (which is my usecase anyways).

This is the code, for anyone interested (trait bounds should be revisited, as they are too general)

Code

```rs // Author: Altair Bueno use std::borrow::Borrow; use std::collections::BTreeMap; use std::collections::HashMap; use std::hash::Hash; pub fn matches(pattern: &str, topic: &str) -> bool { let mut pattern = pattern.split('/'); let mut topic = topic.split('/'); loop { let pattern_level = pattern.next(); let topic_level = topic.next(); match (pattern_level, topic_level) { // Exhausted both pattern and topic (None, None) => return true, // Wildcard on pattern (Some("#"), _) => return true, // Single level wildcard on pattern (Some("+"), Some(_)) => continue, // Equal levels (Some(pattern), Some(topic)) if pattern == topic => continue, // Otherwise, no match _ => return false, } } } pub trait MQTTMatches { type Key<'this> where Self: 'this; type Value<'this> where Self: 'this; fn matches<'this, 'topic>( &'this self, topic: &'topic str, ) -> impl Iterator, Self::Value<'this>)> + 'topic where 'this: 'topic; } impl MQTTMatches for HashMap where K: Eq + Hash + Borrow, S: std::hash::BuildHasher, { type Key<'this> = &'this K where Self: 'this; type Value<'this> = &'this V where Self: 'this; fn matches<'this, 'topic>( &'this self, topic: &'topic str, ) -> impl Iterator, Self::Value<'this>)> + 'topic where 'this: 'topic, { self.iter().filter(move |(pattern, _)| { let pattern: &str = (*pattern).borrow(); matches(pattern, topic) }) } } impl MQTTMatches for BTreeMap where K: Eq + Hash + Borrow, { type Key<'this> = &'this K where Self: 'this; type Value<'this> = &'this V where Self: 'this; fn matches<'this, 'topic>( &'this self, topic: &'topic str, ) -> impl Iterator, Self::Value<'this>)> + 'topic where 'this: 'topic, { self.iter().filter(move |(pattern, _)| { let pattern: &str = (*pattern).borrow(); matches(pattern, topic) }) } } #[cfg(test)] mod test { use super::*; #[test] fn assert_it_works_for_our_usecase() { let mut matcher = HashMap::<&str, &str>::new(); matcher.insert("$MAC/+/+/rpc", "_/device_type/systemid/_"); matcher.insert("$MAC/+/+/+/rpc", "_/device_type/systemid/zoneid/_"); matcher.insert("$MAC/+/rpc", "_/device_type/_"); matcher.insert("$MAC/rpc", ""); let topic = "$MAC/humidifier/1/rpc"; let matches: Vec<_> = matcher.matches(topic).collect(); assert_eq!( matches, vec![(&"$MAC/+/+/rpc", &"_/device_type/systemid/_")] ); } } ```

Altair-Bueno commented 3 months ago

Improved code with tests and documentation. Also, it now works with anything that can be iterated over.

Code

```rs // Author: Altair Bueno /// Checks if a topic with wildcards matches a given topic. fn matches(pattern: &str, topic: &str) -> bool { let mut pattern = pattern.split('/'); let mut topic = topic.split('/'); loop { let pattern_level = pattern.next(); let topic_level = topic.next(); match (pattern_level, topic_level) { // Exhausted both pattern and topic (None, None) => return true, // Wildcard on pattern (Some("#"), _) => return true, // Single level wildcard on pattern (Some("+"), Some(_)) => continue, // Equal levels (Some(pattern), Some(topic)) if pattern == topic => continue, // Otherwise, no match _ => return false, } } } /// Extension trait for map types and tuple iterators that allows to filter /// entries by matching a MQTT topic. /// /// # Example /// /// ``` /// use std::collections::HashMap; /// use std::collections::HashSet; /// use az_mqtt::util::MQTTMatchesExt; /// /// let mut matcher = HashMap::<&str, &str>::new(); /// matcher.insert("$MAC/+/+/rpc", "_/device_type/systemid/_"); /// matcher.insert("$MAC/+/+/+/rpc", "_/device_type/systemid/zoneid/_"); /// matcher.insert("$MAC/+/rpc", "_/device_type/_"); /// matcher.insert("$MAC/rpc", ""); /// /// let topic = "$MAC/humidifier/1/rpc"; /// let matches: HashSet<_> = matcher.matches(topic).collect(); /// assert_eq!( /// matches, /// HashSet::from([("$MAC/+/+/rpc", "_/device_type/systemid/_")]) /// ); /// ``` pub trait MQTTMatchesExt { /// The key type returned by the iterator. type Key; /// The value type returned by the iterator. type Value; /// Matches the given topic against the keys of the map and returns an /// iterator over the matching entries. Keys of the map are expected to /// be MQTT topic patterns and may contain wildcards. fn matches<'topic>( self, topic: &'topic str, ) -> impl Iterator + 'topic where Self: 'topic; } impl MQTTMatchesExt for C where C: IntoIterator, K: AsRef, { type Key = K; type Value = V; fn matches<'topic>( self, topic: &'topic str, ) -> impl Iterator + 'topic where Self: 'topic, { self.into_iter() .filter(move |(pattern, _)| matches(pattern.as_ref(), topic)) } } #[cfg(test)] mod test { use super::*; #[test] fn assert_that_no_wildcards_matches() { assert!(matches("a/b/c", "a/b/c")); } #[test] fn assert_that_plus_wildcard_matches() { assert!(matches("a/+/c", "a/b/c")); } #[test] fn assert_that_leading_plus_wildcard_matches() { assert!(matches("+/b/c", "a/b/c")); } #[test] fn assert_that_trailing_plus_wildcard_matches() { assert!(matches("a/b/+", "a/b/c")); } #[test] fn assert_that_hash_wildcard_matches_none_level() { assert!(matches("a/b/#", "a/b")); } #[test] fn assert_that_hash_wildcard_matches_single_level() { assert!(matches("a/b/#", "a/b/c")); } #[test] fn assert_that_hash_wildcard_matches_multiple_levels() { assert!(matches("a/b/#", "a/b/c/d")); } } ```

fpagliughi commented 2 months ago

Hey! Thanks for this. Yes, the TopicMatcher was a fairly quick hack which I meant to re-assess when I had the time. But it has been working for my use case, so it got a low priority.

Unfortunately, since this is an Eclipse project, I can't really do anything with you code unless you submit it as a PR with a signed Eclipse (ECA) Agreement. Eclipse is big on the legal stuff.

Altair-Bueno commented 2 months ago

I can do that, i already signed the ~annoying~ ECA for another project.

Do you want me to remove TopicMatcher and replace it with MQTTMatchesExt? If so, keep in mind it is a breaking change. But to be fair, TopicMatcher is already broken so there is little motivation to keep it. Additionally, want a different name or API?

fpagliughi commented 2 months ago

Keep the name TopicMatcher. Submit against the develop branch. I'll use that to roll up a couple of breaking changes, if necessary.

fpagliughi commented 2 months ago

The broken TopicMatcher is fixed in v0.12.4. The Rust example above now produces:

[("$MAC/+/+/rpc", "_/device_type/systemid/_")]

which is the key/value pair that matched; the key being the filter.

The new TopcMatcher is optimized quite a bit, and I'm much happier with the implementation. There are string and map allocations as the trie is created, but the iterator only uses a single Vec<> stack to search for matches. That seems pretty normal/reasonable for a tree search, especially one that is normally built/modified once and searched maybe millions of times.

I'll write some performance benchmarks in the near future.

I assume the existing implementation "does not work as expected" is fixed, so will close this issue. If it still seems broken, please feel free to re-open or create a new ticket.

As for new and alternate implementations and a common trait, let's move that discussion to PR #228