Closed rljacobson closed 1 year ago
I think I'm going to keep things how they are, but if there's some added documentation that would have helped make this clearer, I'm totally on board with that.
There are a few reasons why:
StartKind::Anchored
is IMO somewhat of a niche case. Basically, if you know enough to go out of your way to do that, it's reasonable to expect one to pay careful attention to the Input
configuration. That is, I'm okay with making the caller need to do a little extra work here.AhoCorasick
value is a little too spooky for my taste. It's a very subtle change to match semantics and I'd really prefer that callers are explicit about it.StartKind
configuration is not part of the type of an AhoCorasick
automaton. So if you wrote a routine that accepts an AhoCorasick
automaton, ac.try_find(...)
might differ in behavior in ways you might not have expected because of the Input
defaults. One could argue that such a generic routine should specify whether it's doing an unanchored or an anchored search explicitly or not to avoid the impact of defaults, but that's too much of a footgun IMO.Basically, I acknowledge that you hit an unexpected case. But you got a loud failure about it. IMO, that's the API design working as intended. You didn't get subtly different match semantics or a silent failure. It hit you in the face. That's what I was hoping would happen, if that makes sense. :-)
The StartKind configuration is not part of the type of an AhoCorasick automaton.
I think this alone is reason enough to keep it as it is.
But you got a loud failure about it. IMO, that's the API design working as intended.
Yes, agreed. In the kinds of programs I write, anchored searches are the most common use cases. But obviously my expectations shouldn't drive API design if I'm not the typical case. And like I said, the docs are pretty clear.
The default
anchored
state for a newInput
is confusing, at least in the case that theInput
is created automagically from, say, a&str
. Consider:Despite having explicitly set the
start_kind
toStartKind::Anchored
, the search uses the default of the implicitly createdInput
object, which isAnchored::No
. As a user, I would expect a call totry_find(…)
with a&str
to default to thestart_kind
that was explicitly set on the matcher.I'm reciting my understanding of what's going on here in case I am seriously misunderstanding it, so please forgive the unnecessary detail.
Different constructions are required for the different
start_kind
behaviors ofAhoCorasick
, and so theStartKind
enum has three variants,Anchored
,Unanchored
, andBoth
, corresponding to which constructions the object uses. Given that anAhoCorasick
might support either start behavior, it makes sense for anInput
to specify which behavior to use.The docs are fairly clear that automatic conversion from
&str
toInput
uses the defaultAnchored
variant and that the defaultAnchored
variant isAnchored::No
(e.g. here). So this is definitely a user error.The downside is that client code is required to keep the start behavior of the
Input
in sync with theAhoCorasick
even when it is implicitly determined. Would one of these alternatives make more sense?Anchored
enum has anAnchored::Default
variant which defers to theAhoCorasick
. Under this scheme, the default behavior is moved out of theInput
object and into theAhoCorasick
object, and the default is only used when client code does not explicitly specify the start behavior and theAhoCorasick
supports both start behaviors.Anchored
has anAnchored::Deferred
variant, but the default forInput
remainsAnchored::No
(inInput::new(…)
) except when aT: AsRef<[u8]>
is converted to anInput
, in which caseAnchored::Deferred
is used as the default for theInput
. This scheme is a bit more complex, as there are now three "defaults":Anchored:No
when a variant is not specified for a new explicitly createdInput
(as it is now);Anchored::Deferred
when aT: AsRef<[u8]>
is converted to anInput
;StartKind::Unanchored
for anAhoCorasick
that is created withoutStartKind
being set explicitly (as it is now).The advantage of scheme (1) is that you retain the unanchored default start behavior but throw an error or panic in strictly fewer cases. The advantage of (2) is that fewer things change from the perspective of the API user, but at the cost of automatic conversion from a
T: AsRef<[u8]>
being treated as a "special case", which might feel inconsistent to some people. Both have the advantage that less needs to be known by the programmer about the details of the API, IMHO. Also, in both alternatives, a third variant is added to theAnchored
enum, so there's the opportunity to unify theAnchored
andStartKind
enums into a single enum—but some care and creativity would be required in naming the third variant.Even if you agree, I appreciate the fact that it's an API change to a very mature crate and so might not be worth doing on those grounds alone.