Closed xamgore closed 5 months ago
Huh? I can't make heads or tails of what you're talking about. Are you just asking for a Default impl?
The automaton should work fine with zero patterns.
Now that I have hands on a keyboard, here's an example:
use aho_corasick::AhoCorasick;
fn main() {
let ac = AhoCorasick::new(None::<&str>).unwrap();
dbg!(ac.find("foo"));
}
With output:
$ cargo r -q
[main.rs:5:5] ac.find("foo") = None
It sounds like all you're asking for here is that AhoCorasick
implement Default
, where the default impl would be zero patterns.
I basically agree that this is a reasonable default, but I am somewhat hesitant to commit to it. I would say that it is odd since it is effectively useless. And I suppose could be surprising in some cases. But I do sympathize with the idea behind making derive(Default)
easier for types that own an AhoCorasick
.
I suppose it is perhaps somewhat analogous to String::default()
, which just gives you the empty string. But maybe the difference there is that you can then add to the String
. You can't add to an existing AhoCorasick
value.
The automaton should work fine with zero patterns.
I've extracted StreamChunkIter
implementation, updated it to provide a context (stream position, xml stack level, ...). According to your comment, a dyn Automaton
with zero patterns returns "match event" on each character. Thus:
If you have a stream that does not fit in memory, and you want to run a few transformations over it (find-and-replace is not the only one) with a dyn Automaton
, you can't do this with zero patterns. Either the client code has more ifs, or it could use a no-op automaton.
If you check out the source code of StreamChunkIter::new()
, it returns a dynamic error. What are chances to trigger it during development? If you were lucky and already know about this, then each time you want to transform the stream, you have to deal with MatchKind
, where all of the variants are critical failures, except UnsupportedEmpty
. If you check an example or the documentation, you won't find "zero patterns failure" being mentioned.
let ac = AhoCorasick::new(patterns).unwrap(); // patterns = []
ac.try_stream_replace_all(rdr.as_bytes(), &mut wtr, replace_with)?; // std::io::Error
// should be more like
// module A:
let ac = AhoCorasick::new(patterns).unwrap();
// module B:
match ac.try_stream_replace_all(rdr.as_bytes(), &mut wtr, replace_with) {
Ok(()) => (),
Err(e: std::io::Error) if err == /* get the inner error and check it's not UnsupportedEmpty */ => (),
err => err?,
};
The other argument: iterative development. The less motion the better.
Maybe I don't see cons (would like to hear!), and my suggestions are:
NoOpAutomaton
(single state, no matches).MatchKind::UnsupportedEmpty
, as it's a hidden bomb 💣
aut.min_pattern_len() == 0
into helper functions like try_stream_replace_all
, and redirect the rdr
into wtr
(no patterns means no replaces)
Please pop up a level. You're in the implementation weeds. What problem are you trying to solve as a user of this library?
The automaton should work fine with zero patterns.
Yeah, seems like it does, no matches found. I used to think it wouldn't due to your comment, which is obsolete probably.
use aho_corasick::{Anchored, nfa, automaton::Automaton}; // 1.1.3
fn main() {
let patterns = Vec::<&str>::new();
let nfa = nfa::noncontiguous::Builder::default().build(patterns).unwrap();
let mut sid = nfa.start_state(Anchored::No).unwrap();
for &byte in "test".as_bytes() {
sid = nfa.next_state(Anchored::No, sid, byte);
assert!(!nfa.is_match(sid));
}
}
Thanks for attention, sorry for taking your time. Default AhoCorasik is just AhoCorasick::new([])
.
That comment is still correct. It says empty matches, which is referring to zero-width matches, not "matches for the empty automaton."
Indeed. 🤦🏻♂️
tldr; I suggest defining an empty automaton struct implementing
Automaton
trait with semantics like return no matches.Somewhere in the source code I've seen a line saying it's a nonsense to run automaton with an empty pattern set. While this may be true, the absence of the
Default
implementation for automaton affects wrappingstruct
s too.Of course, the field could be of type
Option<Arc<dyn Automaton>>
. It means all callers now have to check the presence of the automaton each time they deal with it. The other pitfall is a build method, now you can't just proxy the iterator downwards. We need to check it for emptiness.What I need is just an automaton that does nothing, maybe has a single state, and returns no matches. Can make a PR if everything aforementioned seems reasonable to you.
PS. Thanks for the great job! 😌