fsharp / fslang-suggestions

The place to make suggestions, discuss and vote on F# language and core library features
345 stars 21 forks source link

Allow n partitions for ActivePatterns instead of a max 7 #213

Open baronfel opened 8 years ago

baronfel commented 8 years ago

Submitted by Steven Taylor on 3/25/2014 12:00:00 AM
23 votes on UserVoice prior to migration

Active patterns can only have seven partitions (see: http://msdn.microsoft.com/en-us/library/dd233248.aspx) It would be good if this was a stylistic constraint that could be overridden if desired. For example, if active patterns are getting used to split up XML nodes meaningfully, you aren't in control of how many nodes sensibly fit into the top level.

Original UserVoice Submission Archived Uservoice Comments

dsyme commented 8 years ago

Marking this as approved. However it is surprisingly hard to implement.

7sharp9 commented 5 years ago

@dsyme Why is it hard, because n number is difficult for type inference?

We could just follow what Action does and expand the number to 16 :-)

abelbraaksma commented 5 years ago

@7sharp9, I think it's tricky to get this right without blowing up the generated IL. Currently, it's already quite easy to get into serious trouble because the compiler sometimes has to brute-force all the permutations in match expressions, leading to exponentially growing code size. Allowing a higher upper bound could have you run into this problem ever so sooner (in one of the bug reports on this code-explosion issue, I believe @dsyme referenced the required complexity to be caused by the halting problem, though I'm not sure in which specific cases that holds, or that we can actually come up with a better decision logic that doesn't blow up so fast).

dsyme commented 5 years ago

@abelbraaksma I don't think that's the issue in this case

If I remember rightly the problem was with the technical implementation as you match through the Rest element of the tuple that appears for tuple returns of size > 7.

It's a long time ago but when I looked at lifting this restriction I remember being surprised how hard it was.

7sharp9 commented 5 years ago

To be fair I’ve never needed 7+ Yet...

On Sat, 9 Mar 2019 at 16:14, Don Syme notifications@github.com wrote:

@abelbraaksma https://github.com/abelbraaksma I don't think that's the issue in this case

If I remember rightly the problem was with the technical implementation as you match through the Rest element of the tuple that appears for tuple returns of size > 7.

It's a long time ago but when I looked at lifting this restriction I remember being surprised how hard it was.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/fsharp/fslang-suggestions/issues/213#issuecomment-471196625, or mute the thread https://github.com/notifications/unsubscribe-auth/AAj7yisybQYqBCXKXe5CQhm24USYvJooks5vU93ZgaJpZM4Kbo-U .

serefarikan commented 4 years ago

I'm glad I found this request already made. I write code that processes metadata and there are many cases where I need to classify various metadata items. In other words, I have lots and lots of custom defined subsets/partitions of data. Active Patterns would do a fantastic job if I could create partitions of more than seven items. Ideally, I would like to have the capability to have hundreds of partitions in a pattern, but I realise this may be asking for too much, especially for the compiler to keep things safe. Just wanted to add to potential uses, though mine is probably quite a niche one. I'm currently using DUs that classify subsets of other DUs' individual cases btw.

abelbraaksma commented 4 years ago

I can add to the use cases: my case is a generalized type system, where each natively mapped type fits in a certain category of types that define a set of operations. The generalized set is 9, which I presently solve with an active partial pattern over 6 cases, and a pure pattern over the rest. It's not ideal, and it loses part of the completeness checks, but there's no clean alternative.

Another thing is that I've coded around this limitation by trying to stick to max 7 cases for certain scenarios that require active, complete patterns. I think that's somewhat arbitrary. I'd certainly welcome lifting this limitation.

Wouldn't a quick solution be expanding the Choice type to 10 or 14? Isn't that what's currently used under the hood (up until 7)? That'll introduce a new limitation, but at least it'll be a bit higher. Ideally we'd end up with no limitation at all, though ;).

Or nest Choice, that'll give 7x6 possibilities.

kerams commented 1 year ago

@abelbraaksma, nesting would in theory allow an infinite number of cases - be it through multiple levels of nesting in each choice, or by reserving the last one to simulate tuple Rest chaining.

abelbraaksma commented 1 year ago

Yeah, good point. Though returning a nested Choice is valid, right now, it'd be interesting to find out whether this leads to backward compat issues (even if they do, maybe they're small enough or niche enough to ignore).

kerams commented 1 year ago

The compiler knows how many choices there are, and therefore where the actual payload starts.

abelbraaksma commented 1 year ago

Ah, yes, and < 7 choices, i.e. existing code, can easily be disambiguated on the call site. Fair enough, I think that could work.

kerams commented 1 year ago

At the same time, I'm not sure how the compiler processes active patterns encountered in IL. If it just looks at the return type, then there might be trouble when an older compiler loads an assembly built by a new compiler with 10 pattern choices.

Tarmil commented 1 year ago

I'm pretty sure that the compiler just looks at the name of the function to determine that it's an active pattern, and then checks that the return type is compatible:

let ``|A|B|`` x =
    match x with
    | "" -> Choice1Of2 42
    | _ -> Choice2Of2 ()

match "test" with
| A x -> x
| B -> 0
// This works!

let ``|C|D|E|`` x = 42
// error FS0001: This expression was expected to have type
//     'Choice<'a,'b,'c>'
// but here has type
//     'int'

let ``|A|B|C|D|E|F|G|H|`` x = 42
// error FS0265: Active patterns cannot return more than 7 possibilities

That being said, I don't know what the compiler would do if we manufactured an assembly with a function named |A|B|C|D|E|F|G|H| and then tried to call it.

abelbraaksma commented 1 year ago

Re-reading this thread and @dsyme mentioning tuples in this https://github.com/fsharp/fslang-suggestions/issues/213#issuecomment-471196625, and considering and testing the last comment by @Tarmil, this seems to be correct. As in, there's no (direct) relation to tuples that I can see.

The limitation seems to be given from FSharpChoice not accepting more than 7 arguments. However, there's nothing that necessarily limits this type to contain more generic type params (i.e., Func already has 16 or so). Whether this influences performance, I don't know.

Of course, there's a difference between creating a new fixed upper limit, and making it entirely free. A new fixed upper limit, I assume, is gonna be relatively easy to implement. No upper limit may be a little harder to do, as that needs something like that tuple-style approach (with nested Rest) that Don alluded to.

kerams commented 1 year ago

That being said, I don't know what the compiler would do if we manufactured an assembly with a function named |A|B|C|D|E|F|G|H| and then tried to call it.

I have modified fsc to let me compile such an active pattern (returning 7-Choice for >7 recognizers). I am happy to report that when you load such a binary and try to use the AP in pattern matching with the current compiler, nothing blows up and you're just given the old error FS0265: Active patterns cannot return more than 7 possibilities. Backwards compatibility thus doesn't seem to be an issue.

image

I propose (|A|B|C|D|E|F|G|H|I|J) gets a return type like Choice<Choice<'a, 'h, 'i, 'j>, 'b, 'c, 'd, 'e, 'f, 'g> or perhaps Choice<Choice<'g, 'h, 'i, 'j>, 'a, 'b, 'c, 'd, 'e, 'f>.

Do we extend the mechanism to add one level of choice nesting to increase the maximum to 7x7 recognizers or make it fully and automatically nestable, letting you go crazy until you hit CLR limits?

baronfel commented 1 year ago

I think it should behave like large tuples, where the 'final' slot becomes a nested Tuple as well - that will maintain symmetry between the two 'arbitrarily-large' constructs.

serefarikan commented 1 year ago

@baronfel I think that was the solution I found when I wrote the code to deal with this limitation, just to see if I can getaway with it. Reading these suggestions, it makes me happy to read that my shotgun surgery was not so horrible after all! The compiler doing the same would be fantastic.

kerams commented 1 year ago

The disadvantage of that approach is that you increase "choice depth" with every 7 recognizers, so you end up needing way more case checks during pattern matching for the "rightmost" recognizers. Depth increases logarithmically, not linearly, with my approach.

Given 22 recognizers (|A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|), matching on V would look something like:

match x with
| V -> ()

// turns into
match ``(|A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|)`` x with
| Choice7Of7 (Choice7Of7 (Choice7Of7 (Choice4Of4 ()))) -> ()

// versus
match ``(|A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|)`` x with
| Choice3Of7 (Choice4Of4 ()) -> ()

Besides, I don't think the symmetry argument holds much water. Yes, you can in theory manually construct the choices(*), but there are few reasons to do so. The only other place, I think, where the underlying type is surfaced, is in tooltips, like in my screenshot above. I would also argue, that it needn't be displayed there either, since the overall return type is an implementation detail.

(*)

let (|A|B|) x = if x = 0 then A else Choice2Of2 ()
T-Gro commented 1 year ago

I am with @kerams here.

With active patterns, going with the "tuple approach" would mean progressively unbalanced access paths to matching individual clauses. (the first 7 would be direct, next 7 would be 2-step, etc.).

There would need to be a justification on why the right-most elements are to be penalized, given some meaningful examples. (and developers shouldn't be tasked to benchmark various permutations of the case ordering...)

jwosty commented 1 year ago

Would some kind of scheme be feasible where we nest cases breadth-first, to make it symmetrical?

val (|A|B|C|...|X|) :
    Choice<'a, 'b, 'c, 'd, Choice<'e, 'f, 'g, 'h, 'i, 'j, 'k>, Choice<'l, 'm, 'n, 'o, 'p, 'q>, Choice<'r, 's, 't, 'u, 'v, 'w, 'x>>

And further:

val (|T1|T2|T3|...|T52|) :
    Choice<
        Choice<'T1, ..., 'T7>,
        Choice<'T8, ..., 'T14>,
        Choice<'T15, ..., 'T21>,
        Choice<'T22, ..., 'T28>,
        Choice<'T29, ..., 'T35>,
        Choice<'T36, ..., 'T42>,
        Choice<'T43, 'T44, 'T45, 'T46, 'T47, 'T48,
            Choice<'T49, 'T50, 'T51, 'T52>>>

With a Choice width of 7, you could have up to 777 cases without going over a nesting depth of 3. This does sound somewhat difficult to reason about and implement, however.

A more pragmatic option could be to use tail-chaining, and also expand the number of Choice cases. Given a with of, say, 20, the last branch of an active pattern with 80 choices would have to check 4 nested choices. 80 choices is quite a few.

brianrourkeboll commented 3 months ago

I guess in theory the compiler could just emit a new nominal union type with $n$ cases for each active pattern declaration where $n > 7$.

Basically, something like

let (|A|B|C|D|E|F|G|H|I|J|) x = …

would result in the generation of a union type like

type ``(|A|B|C|D|E|F|G|H|I|J|)`` =
    | A of …
    | B of …
    | …

or, if generic types were involved,

type ``(|A|B|C|D|E|F|G|H|I|J|)``<'a, 'b, …> =
    | A of 'a
    | B of 'b
    | C
    | …

The active pattern function's return type would just be that generated union type.

I.e., the signature of

let (|A|B|C|D|E|F|G|H|I|J|) x = …

would be

val (|A|B|C|D|E|F|G|H|I|J|) : x:'a -> ``(|A|B|C|D|E|F|G|H|I|J|)``

or, if generic types were involved

val (|A|B|C|D|E|F|G|H|I|J|) : x:'a -> ``(|A|B|C|D|E|F|G|H|I|J|)``<'b, 'c, …>

The generated union would be given some kind of metadata such that the compiler would treat matching on its cases as matching on an active pattern (and thus call the associated active pattern function automatically, show it as an active pattern instead of a regular union on hover in tooling, have go-to-definition take you to the active pattern declaration, etc.).

I think this would be backwards-compatible — an older compiler couldn't use such an active patterns as an active pattern, but it would just see a function returning a regular nominal union type and could call it and match on its output in the regular way — and it would address the problems related to nesting.

Is there an obvious reason why something like that wouldn't work or would be undesirable?

T-Gro commented 3 months ago

It could also emit a size-equivalent version of the generic Choice<..> type for n>7, and be updated to acknowledge generated Choice types the same as the built-in ones.

The benefit being at most 1 new type for each n, and regularity in appearance with n<= 7.

brianrourkeboll commented 3 months ago

@T-Gro

It could also emit a size-equivalent version of the generic Choice<..> type for n>7, and be updated to acknowledge generated Choice types the same as the built-in ones.

The benefit being at most 1 new type for each n, and regularity in appearance with n<= 7.

Yeah, I considered that as well, but how would that work across assemblies? Like anonymous records, where what looked like the same type in two different assemblies wouldn't actually be the same type? I guess that would probably be fine for active patterns, though, since the return value is seldom used in a first-class way.

T-Gro commented 3 months ago

Exactly, matching on those via active patterns would be fine. If for some reason anyone wanted to write generic logic processing any active-pattern-function for let's say n=15 uniformly and across assemblies, that would not work (and it is fine that it would not work, it is not a scenario to be supported IMO).