AvailLang / Avail

The Avail programming language. Includes the virtual machine, standard library, and standard examples.
BSD 3-Clause "New" or "Revised" License
54 stars 5 forks source link

Polymorphic macros seem to run all implementations #278

Open markATavail opened 1 year ago

markATavail commented 1 year ago

We're still working on a tiny test case to exhibit the problem (and which will be added as a regression test), but Todd's work with semantic version on 2023-08-23 indicated strongly that multiple macro implementations ran. The method was an override of "«…‡,»::=_;" for exactly 3 local constants, which would have caused an early divergence in the parsing instruction tree, making it unclear whether we even have invented a mechanism that can converge them again to prevent multiple macro implementations from running. The proximal solution involved making the locals list in the override have size [2..∞), like the core implementation, and post-filtering it with a parse rejection if it didn't happen to be 3. That success seems to implicate the divergence of bundle trees due to parsing instruction variation in the plans generated for the implementations.

markATavail commented 1 year ago

I fear the fix will involve a significant change to the bipartite rendezvous mechanism. That mechanism allows both new subexpression parses and new requests that consume a subexpression to show up in arbitrary order, in different threads, while ensuring that each consumer gets a chance to run with each possible subexpression.

  1. A "simple" fix would be to have the bipartite rendezvous refuse to run any consumers until all possible subexpressions have been produced, collected, and deduplicated (because of multiple parsing paths for a send of the same expression, due to parsing instruction variation for implementation-specific plans). Most likely this would greatly reduce the degree of parallelism in the compiler, while at the same time increasing the coordination cost.
  2. A more sophisticated solution might need to join the flows and deduplicate only for polymorphic methods/macros that actually have differing plans for their implementations.
  3. Since we're only concerned with redundant parses that have reached the same position in the source (having parsed an expression the same way twice due to two distinct parsing plans), we could use a ratcheted mechanism that collects subexpressions that have reached a particular parse position, and only allow the consumers to run when there are no more work units parsing prior to that position. A global heap (min tree) would keep track of how many outstanding tasks have been queued for examining each particular parsing position. The heap would be keyed/ordered by position, and contain an atomic counter indicating how many tasks have been queued to parse at that position. Even though most counters can reach zero at any time, even multiple times, the counter that's leftmost can only go to zero exactly once, when all of the parsing tasks at that position have completed. The heap's counters should also appear in a (concurrent) hashed map for efficient access.
  4. Yet another scheme would force all plans for polymorphic methods/macros to be the same, skipping the instruction optimizations that weren't available for all implementations. This could theoretically improve performance, since redundant parsing strategies wouldn't be started (only the most general, which would be the slowest). This might also improve diagnostics, since there wouldn't be multiple redundant parses producing parse expectations (potential syntax error information) at the same position. An easy approach would be to compute the union of the signature types, and have all the method/macro implementations' plans be reconstructed (to be the same, new, plan) if this union changes due to changes to the implementation set.
markATavail commented 1 year ago

For the specific case of semantic version, the error was about a redundancy of aMinor, the second local constant. That was probably a consequence of the first local constant, aMajor, having been parsed two distinct ways, leading to two different constants in the different parsing paths (due to differing parsing instructions).