adobe / ferrum

Features from the rust language in javascript: Provides Traits/Type classes & a hashing infrastructure and an advanced library for working with sequences/iterators in js
https://www.ferrumjs.org
Apache License 2.0
519 stars 25 forks source link

Extend Ferrum to work with async sequences #192

Open tobia opened 3 years ago

tobia commented 3 years ago

Problem

Many libraries (such as database drivers) nowadays return AsyncIterator instances. I have found it to be a great abstraction for an asyncronous sequence of items, mainly because it's well integrated with the language, thanks to the for await, yield, and yield* statements.

But writing the same imperative loops over and over again has the same drawbacks we know from working with regular iterators and collections. Therefore I have been factoring my code into my own hodge-podge of functional utilities (map, filter, take...) implemented using async generators and for async loops.

It would be nice if ferrum was somehow extended to work with them, using the same universal language it has already defined for working with synchronous collections and iterators.

Proposed solution

It is well known that Symbol.iterator (ie. Ferrum's Sequence trait) cannot be implemented for AsyncIterator / AsyncGenerator objects. Therefore one solution would be to implement a new AsyncSequence trait, reusing the standard Symbol.asyncIterator symbol, and extending the entire Ferrum library to work with it.

For instance, map() would check whether its first argument implements the AsyncSequence trait and in that case return an AsyncSequence itself.

koraa commented 3 years ago

Yes, you are right that is a desired feature. I suppose you've already seen the existing issues #72 and #169

Basically we have more than two kind of values we have the properties

  1. size=1, size=0|1, size=*

  2. Sync vs async

  3. lazy (evaluation must be explicitly requested) vs immediate (evaluation is happens without requesting)

  4. A value is size=0|1, sync, immediate (regarding undefined as empty).

  5. A function is size=0|1, sync, lazy.

  6. A Promise is size=0|1, async, immediate

  7. A list is size=*, sync, immediate

  8. A sequence is size=*, sync, lazy

  9. An async sequence is size=*, async, lazy

  10. A reactive stream is size=*, async, immediate

You may notice that some combinations are missing; no sort of value is size=1, that is because in JS every value can be null/undefined so you have to embed size=1 values in size=0|1 and just define undefine/null to be invalid.

size=0|1, async, immediate also missing but you can compose that as a function returning a promise (an async function).

Now the trouble really is it would be nice to have some framework that would support multiple sorts of these values in an extensible way especially since there are alternate implementations of lazy values, promises, reactive sequences and it would be nice to provide support for those.

Then the next rabbit hole is that some functions such as map() just return the given sort again. Given an iterator it will just return an iterator. Given a size=*, async, lazy it will probably just return another one making the meta type signature something like map :: N, S, L, (N, S, L) -> (N, S, L) (where N, S and L are variables). This is not the case for all functions though; e.g. range0() takes a number and returns an iterable making it's signature something like range :: 1si -> *sl. (read: one sync immediate to many sync lazy.) We could consider extending this to range :: X, 1Xi -> *Xl (read: one X immediate to many X lazy) – this way the function could take a promise and return an async sequence…but that probably would not make much sense.

Another example, extend would make much more sense to represent with this framework: It takes a function and repeatedly applies this to some input to build up a sequence, so: extend :: 1si -> (1si -> 1si) -> *sl; we could extend this again to extend :: X, 1Xi -> (1Xi -> 1Xi) -> 1Xl. Note that the really tough nut here is: how do we choose the instantiation? Probably by the value, because we don't even know the type of a function because js is duck typed… This gives rise to the convention that in a function (X, ...) -> the type resolution of the function should probably depend on X and the other parameters should be lifted into whatever type is indicated. This is a great feature because it would allow map(x => x*2) to be used on pretty much any input too so it is what we need, but this sort of polymorphic currying really gives me a headache.

I do have an idea how to do this; we can probably model this using coroutines or generator functions. Every Type implementing Sort needs to provide four things:

  1. A sort-signature ("This type Array implements Sort *si")
  2. Conversion rules ("If you need to turn an Array into Sort *sl, use this specific instance which in practice will be just some dummy type implementing the iterator protocol.") Along with defaults.
  3. A coroutine scheduler for sort generation (which will be trivial in most cases)
  4. A coroutine scheduler for sort iteration (which will be trivial in most cases)

Then based on this we could implement a wrapper that lets us model functions operating on a pair of generators as a third generator for ease of use:

const Arity = Enum(['One', 'Many']);
const [One, Many] = Arity;

const Exec = Enum(['Sync', 'Async']);
const [Sync, Async] = Exec;

const Avail = Enum(['Immediate', 'Lazy']);
const [Immediate, Lazy] = Avail;

const SortDescriptor = Tuple('SortDescriptor', ['arity', 'exec', 'avail']);

const ValueSort = SortDescriptor(One, Sync, Immediate);
const ListSort = SortDescriptor(Many, Sync, Immediate);
const SeqSort = SortDescriptor(Many, Sync, Lazy);

const PromiseSort = SortDescriptor(One, Async, Immediate);
const EventSort = SortDescriptor(Many, Async, Immediate);
const RxSort = EventSort;
const AsyncIteratorSort = SortDescriptor(Many, Async, Lazy);

const LazySort = … // function can be used to represent a lazy value
const AsyncLazySort = … // async funcion can be used…

const Sort = Trait('Sort');
Sort.impl(Array, () => ({
  descriptor: ListSort,
  conversionRules: dict({}), // defaults
  schedConstruction: // I would have to think a while before I could build this and decide on the proper interface
  schedIteration:    // nor this
}));

... // Other types

const defaultConversionRules = new Map({
  ValueSort: Box, // We probably need a dummy type for this that just wraps a value; might be a unary tuple or something
  ListSort: Array,
  SeqSort: Sequence, // Gonna introduce a sequence constructor,
  PromiseSort: Promise,
  EventSort: SomeNewReactiveStreamType, // Gotta build that too
  AsyncIteratorSort: AsyncSequence, // Gonna
  LazySort: Lazy,
  AsyncLazySort: AsyncLazy,
});

const sortConversionTarget = (value, targ) => false
  || ifdef(Sort.lookupValue(value), (impl) => impl().get(targ))
  || defaultConversionRules.get(targ);

const Consume = Symbol();
const ConsumeOutsideValue = Tuple('ConsumeOutsideValue', ['Value']);
const TryConsume = Symbol();
const DeclareResult = Tuple('DeclareResult', ['Sort']);
const Produce = Tuple('Produce', ['value']);
const TryProduce = Symbol('TryProduce', ['value']);

const sortMetaFunction = (name, impl) => {
  // No idea how to implement this yet but should support auto currying and all cool stuff
};

const extend = sortMetaFunction('extend', function *(inputSort, fn) {
  yield DeclareResult(inputSort // Does not care if input is sync or async; output will be the same
    .oneIntoMany() // Will throw if input was many or zero
    .ensureLazy()); // Will be entirely ok if input was lazy
  let val = yield Consume;
  while (true) {
    yield Produce(val);
    // Uses the resolution function provided by the consumer; If the producer was async this should for
    // instance resolve the promise
    val = yield ConsumeOutsideValue(fn(val))
  }
});

Which is to say, yes this is a desired feature but getting right is exceedingly difficult particularly in light of the sub par choices made by js. I think it boils down to bringing usable category theory to js which is scary …I wouldn't want to provide a partial solution using a simple if statement in a stable version…

I was planning to introduce a Ferrum Next branch for the next major version containing some implementation of this entire concept; I think a partial solution using if statements could very well be added there. I would be interested in how well that would work…it may well be possible that I am overthinking it!

koraa commented 3 years ago

Also take note of #138