optimization: Reduce overhead of SimpleTransformIterator

jeswr commented 2 years ago

For basic methods like .map and .filter, on AsyncIterator, using the SimpleTransformIterator introduces uneccessary overhead via the readAndTransformSimple operator. To improve performance in applications like Comunica it may be worth creating custom iterators with simpler read methods; for instance a synchronous MapIterator could skip buffering entirely and just be implemented as something like

export class MapIterator<T, S> extends AsyncIterator<S> {
    ...
    read() {
       if ((item = source.read()) !== null)
          return this.map(item);
    }
}

@jacoscaz I suspect this is the cause of the potential slowdown you were encountering when doing a large amount of chaining.

jacoscaz commented 2 years ago

Indeed, chained buffering steps might explain the slowdowns I’ve seen. Normally I would expect performance to slow down linearly with the number of chained transforms but often I see slowdowns that seem somewhat exponential. I had been under the impression that methods such as map() and filter() behaved as you described but, given this issue, I was either plain wrong or possibly remembering a previous version. This would be a great addition to asynciterator itself but we could do it separately if better for backward compat.

However, I do not have hard numbers at hand. @jeswr have you done some profiling on this?

rubensworks commented 2 years ago

@jeswr Do you think think the slowdown is caused by the complexity of the code in readAndTransformSimple, or the fact that it fills an internal buffer? If the latter, just disabling the buffer on intermediary iterators might also improve overall performance.

jeswr commented 2 years ago

@rubensworks I'd suspect you're correct in that the intermediary buffers are the main overhead.

I was thinking that one way around this is to have a 'chained transform' which applies multiple map/filter/transform operations in one go, so long as each part of that chain is synchronous - is this equivalent to what you mean by disabling internal buffers?

jeswr commented 2 years ago

@jacoscaz - no I have not done profiling, it was just an observation I made when digging through the code when working on a PR I made earlier

jacoscaz commented 2 years ago

I was thinking that one way around this is to have a 'chained transform' which applies multiple map/filter/transform operations in one go

Would this not require knowing what those transformations look like ahead of time? Unless, perhaps, one were to have all intermediate actors return a transform function and have the final actor concatenate these.

However, if all intermediate steps were to be implemented as iterators without internal buffering as you suggested, provided an initial buffered iterator for I/O queries (i.e. reading from a file), the resulting performance should not be too far from that of a pure concatenation of functions while allowing to keep the current internal structure.

We should come up with a couple of examples that clearly demonstrate whether there’s a real issue here or not. One possible way of doing this would be to manually implement a no-buffering map iterator and profile it against the current map(). I’d be up for that if nobody has time to do so, there’s potential for a great return of investment.

jeswr commented 2 years ago

Would this not require knowing what those transformations look like ahead of time?

I was thinking that methods like map, filter and transform would add to the internal set of operations that need to be done and then return the same iterator. The major caveat of this being that this would affect upstream reads of the iterator if there are any

I’d be up for that if nobody has time to do so

That sounds great but I'd wait for #43 to be merged as there may introduce some small performance changes

jacoscaz commented 2 years ago

I was thinking that methods like map, filter and transform would add to the internal set of operations that need to be done and then return the same iterator.

I suspect that the concatenation mechanism might be a non-trivial one, and it would have to be instantiated even for single-op iterators unless we were to make it a lazy one, which would make it more complicated. Nonetheless, I think there’s definitely something to be said for combining operations. For example, could Comunica benefit from a mapFilter() iterator?

In any case, we should probably look at these as orthogonal issues:

1) on one side we have buffering, and the possibility that disabling buffering in intermediate iterators might result in gains when dealing with long-ish chains;

2) on a different side we have combining N transformations into M < N iterators, and the possibility that doing so might reduce avg chain length.

Both definitely sound worth investigating IMHO.

I'd wait for https://github.com/RubenVerborgh/AsyncIterator/pull/43 to be merged as there may introduce some small performance changes

Seems reasonable! I may start setting up a simple benchmark and then pause and wait for that PR to be merged.

RubenVerborgh commented 2 years ago

Is the SimpleTransformIterator being used in performance-sensitive code?

My goal for it was to be used for simple cases—and definitely not for chaining. I guess I should have been more explicit about that.

For chaining, you want to define your own sync or async transform iterator.

jeswr commented 2 years ago

Because of this the use of .map, .filter, .transform etc. all create a new SimpleTransformIterator and hence there are many places in Comunica that create new SimpleTransformIterators for instance

Mapping:

(and thats only going up to page 3/10 in searching for .map in https://github.com/comunica/comunica/search?p=3&q=.map)

Filtering:

https://github.com/comunica/comunica/blob/4c66bfaa44f73786bc7403504e5fe571181430c7/packages/actor-query-operation-reduced-hash/lib/ActorQueryOperationReducedHash.ts#L29
Given how commonly this is used I think it is worth making the change upstream here in this package

RubenVerborgh commented 2 years ago

Absolutely, thanks for finding this. We probably want:

Dedicated MapIterator, FilterIterator, …
A check within the SimpleTransformIterator that, if only map or filter are requested, one of the dedicated iterators is instantiated instead.

jacoscaz commented 2 years ago

@jeswr could you have a look at this gist of mine? https://gist.github.com/jacoscaz/e20812991714092dbb20cfd432d1bf00

Either I got something wrong or there's virtually no performance difference between .map() and a dedicated mapping iterator.

jeswr commented 2 years ago

To summarize the results I have in the following comments

ArrayIterator causes about a 5x speed degradation compared to range iterator on 20000 elements (because as I raised here the V8 implementation of Array.shift is still very suboptimal).
Using the range iterator on 20000 elements and using custom MapIterator / FilterIterators we see approx. an 8x improvement.
Using my PR which avoids scheduling a few unnecessary tasks (ie just calling the end event straight away where possible creates a further 10% improvement.)
Using a CompositeTransform that reduces over all of the maps and filters causes a further 10-15% performance improvement when compared to simply creating 50 new Map and Filter iterators
Pre-compile the transformation pipeline on the CompositeTransform and observe a 50%-100% performance gain

Combining all of these factors can result in a speed-up of 100x or greater when doing 50 map/filters on 20000 elements; and similar results when we do smaller numbers of map/filters (e.g. 5).

This gist includes some experiments and results that are discussed in the following comments.

@jacoscaz - The vast majority of your time is spent on the array iterator because of https://github.com/RubenVerborgh/AsyncIterator/issues/38; I believe node does not implement similar optimizations to firefox. (@RubenVerborgh for reference; my machine took about 5x times longer to run the implementation @jacoscaz had using the array iterator).

I get the following results for the code I have attached below

ArrayIterator#map(): 145
MappingIterator: 19

if you do 50 mappings the results become

ArrayIterator#map(): 1005
MappingIterator: 143

now here is where it gets more interesting - if we use the array iterator for 50 mappings we get

ArrayIterator#map(): 5321
MappingIterator: 4698

So this means that with a combination of optimizing the ArrayIterator and using the MappingIterator there is a 35-40x performance increase to be had for 50 chaining operations (and for 5 chaining operations the problems of the ArrayIterator become even more pronounced; observing a 200x difference in results).

import {AsyncIterator, range} from 'asynciterator';

class MappingIterator<I, O> extends AsyncIterator<O> {
  constructor(source: AsyncIterator<I>, map: (item: I) => O) {
    super();
    source.on('end', () => {
      this.close();
    });
    source.on('readable', () => {
      this.emit('readable');
    });
    let item: I | null;
    this.read = (): O | null => {
      return ((item = source.read()) !== null)
        ? map(item)
        : null;
    };
  }
}

const time = (createStream: (max: number) => AsyncIterator<number>): Promise<number> => {
  const max = 200000
  const str = createStream(max);
  let count = 0;
  return new Promise((resolve, reject) => {
    const now = Date.now();
    str.on('data', () => {
      count += 1;
    })
      .on('end', () => {
        const then = Date.now();
        if (count != max + 1) {
          reject(new Error('Bad count'));
          return;
        }
        resolve(then - now);
      });
  })
}

const main = async () => {
  const mapMethodTime = await time((max) => {
    let iterator: AsyncIterator<number> = range(0, max);
    for (let i = 0; i < 5; i++) {
      iterator = iterator.map(item => item);
    }
    return iterator;
  });
  const mappingIteratorTime = await time((max) => {
    let iterator: AsyncIterator<number> = range(0, max);
    for (let i = 0; i < 5; i++) {
      iterator = new MappingIterator(iterator, item => item);
    }
    return iterator;
  });
  console.log(`ArrayIterator#map(): ${mapMethodTime}`);
  console.log(`MappingIterator: ${mappingIteratorTime}`);
};

main().catch((err) => {
  console.error(err);
  process.exit(1);
});

Environment:

npm v8.4.1
node v17.4.0

Run on Dell XPS15 with 32G RAM

jeswr commented 2 years ago

With the PR (https://github.com/RubenVerborgh/AsyncIterator/pull/45) I have made I get; for the ArrayIterator method

ArrayIterator#map(): 5532
MappingIterator: 4770

and for the range method I get

ArrayIterator#map(): 944
MappingIterator: 126

As a summary, using the range iterator and doing 50 transformations we get

Current Version	New PR
`ArrayIterator#map()`	1005	944
`MappingIterator`	143	126

To do this I used the following MapIterator

class MappingIterator<I, O> extends AsyncIterator<O> {
  constructor(source: AsyncIterator<I>, map: (item: I) => O) {
    super();
    source.on('end', () => {
      this.close();
    });
    source.on('readable', () => {
      this.emit('readable');
    });
    let item: I | null;
    this.read = (): O | null => {
      const item = source.read();
      if (item === null) {
        // @ts-ignore
        if (this._state === CLOSED)
          this._end();
        return null
      }
      return map(item);
    };
  }
}

jeswr commented 2 years ago

I've also implemented a FilterIterator against my PR (below), for 50 iterations on range the results are

ArrayIterator#filter(): 72
FilterIterator: 13


class FilterIterator<I> extends AsyncIterator<I> {
  constructor(source: AsyncIterator<I>, filter: (item: I) => boolean) {
    super();
    source.on('readable', () => {
      this.emit('readable');
    });
    let item: I | null;
    this.read = (): I | null => {
      let item;
      while ((item = source.read()) !== null) {
        if (filter(item))
          return item;
      }
      // @ts-ignore
      if (source._state === ENDED || source._state === CLOSED)
        this._end();
      return null;
    };
  }
}

jeswr commented 2 years ago

Sorry to keep spamming this thread - I've also implemented a CompositeIterator and it also shows non-trivial performance gains relative to just repeatedly applying the MapIterator and FliterIterator (~10-15% improvement)

ArrayIterator#map(): 894
MappingIterator: 105
CompositeIterator: 90

import { CLOSED, ENDED } from 'asynciterator';
import { AsyncIterator, ArrayIterator, range } from './asynciterator';

type Transform = {
  type: 'filter';
  function: (elem: any) => boolean
} | {
  type: 'map';
  function: (elem: any) => any
}

class CompositeMapFilter<T> extends AsyncIterator<T> {
  private tranforms: Transform[] = [];
  constructor(private source: AsyncIterator<T>) {
    super();
    source.on('readable', () => {
      this.emit('readable');
    });
  }
  read(): T | null {
    let item = this.source.read();

    for (let i = 0; i < this.tranforms.length; i += 1) {
      const transform = this.tranforms[i];
      switch (transform.type) {
        case 'map':
          item = transform.function(item);
        case 'filter': {
          if (!transform.function(item)) {
            item = this.source.read();
            if (item === null)
              break;
            else
              i = 0
          }
        }
      }
    }

    // @ts-ignore
    if (item === null && this.source._state === ENDED) {
      this._end();
    }

    return item;
  }
  // @ts-ignore
  filter(filter: (item: T) => boolean, self?: any): CompositeMapFilter<T> {
    this.tranforms.push({ type: 'filter', function: filter });
    return this;
  }
  // @ts-ignore
  map(map: (item: T) => T): CompositeMapFilter<T> {
    this.tranforms.push({ type: 'map', function: map });
    return this;
  }
}

class MappingIterator<I, O> extends AsyncIterator<O> {
  constructor(source: AsyncIterator<I>, map: (item: I) => O) {
  super();
  source.on('readable', () => {
    this.emit('readable');
  });
  let item: I | null;
  this.read = (): O | null => {
    const item = source.read();
    if (item === null) {
      // @ts-ignore
      if (source._state === ENDED || source._state === CLOSED)
        this._end();
      return null
    }
    return map(item);
  };
}
}

class FilterIterator<I> extends AsyncIterator<I> {
  constructor(source: AsyncIterator<I>, filter: (item: I) => boolean) {
    super();
    source.on('readable', () => {
      this.emit('readable');
    });
    let item: I | null;
    this.read = (): I | null => {
      let item;
      while ((item = source.read()) !== null) {
        if (filter(item))
          return item;
      }
      // @ts-ignore
      if (source._state === ENDED || source._state === CLOSED)
        this._end();
      return null;
    };
  }
}

const generateArr = (): number[] => {
  let i = 0;
  return new Array(200000)
    .fill(true)
    .map(() => i++);
};

const time = (createStream: (arr: number[]) => AsyncIterator<number> | CompositeMapFilter<number>): Promise<number> => {
  const arr = generateArr();
  const str = createStream(arr);
  let count = 0;
  return new Promise((resolve, reject) => {
    const now = Date.now();
    str.on('data', () => {
      count += 1;
    })
      .on('end', () => {
        const then = Date.now();
        // console.log(count, arr.length / 2, arr)
        if (count != arr.length / 2) {
          // reject(new Error('Bad count'));
          // return;
        }
        resolve(then - now);
      });
  })
}

const main = async () => {
  const mapMethodTime = await time((arr) => {
    let iterator: AsyncIterator<number> = range(0, arr.length - 1);
    for (let i = 0; i < 50; i++) {
      iterator = iterator.filter(item => item % 2 === 0);
      iterator = iterator.map(item => item);
    }
    return iterator;
  });
  const mappingIteratorTime = await time((arr) => {
    let iterator: AsyncIterator<number> = range(0, arr.length - 1);
    for (let i = 0; i < 50; i++) {
      iterator = new FilterIterator(iterator, item => item % 2 === 0);
      iterator = new MappingIterator(iterator, item => item);
    }
    return iterator;
  });
  const compIteratorTime = await time((arr) => {
    let iterator: CompositeMapFilter<number> = new CompositeMapFilter(range(0, arr.length - 1));
    for (let i = 0; i < 50; i++) {
      iterator = iterator.filter(item => item % 2 === 0);
      iterator = iterator.map(item => item);
    }
    return iterator;
  });
  console.log(`ArrayIterator#map(): ${mapMethodTime}`);
  console.log(`MappingIterator: ${mappingIteratorTime}`);
  console.log(`CompositeIterator: ${compIteratorTime}`);
};

main().catch((err) => {
  console.error(err);
  process.exit(1);
});

jeswr commented 2 years ago

You can see a further 50-100% performance improvement by 'pre-compiling' the CompositeTransform - see https://gist.github.com/jeswr/f49a6975f1bfc088a9c32565e69761bf

jacoscaz commented 2 years ago

Sorry to keep spamming this thread

@jeswr if all spam were like this we'd all be much happier about spam. Seriously, awesome work.

To summarize the results I have in the following comments

ArrayIterator causes about a 5x speed degradation compared to range iterator on 20000 elements (because as I raised here the V8 implementation of Array.shift is still very suboptimal).

Although I was not expecting shift() to be optimized to the extent implied by @RubenVerborgh , I was also not expecting it to be slow to the point of eclipsing other differences in my tests. Unexpected and very good to know.

The latest revision of my gist at https://gist.github.com/jacoscaz/e20812991714092dbb20cfd432d1bf00/e6cbf3caab930e88edb62f34562619603572c42b has a RangeIterator that appears to run 500x faster than ArrayIterator, which a difference so enormous that I keep looking for mistakes on my end.

Using the range iterator on 20000 elements and using custom MapIterator / FilterIterators we see approx. an 8x improvement.

Can confirm. When going from calling .map() on the RangeIterator to passing the RangeIterator as the source of MappingIterator I, too, see an improvement of 7x - 9x.

Using my PR which avoids scheduling a few unnecessary tasks (ie just calling the end event straight away where possible creates a further 10% improvement.)

Haven't tested this yet but I have no reason to object.

Using a CompositeTransform that reduces over all of the maps and filters causes a further 10-15% performance improvement when compared to simply creating 50 new Map and Filter iterators

Pre-compile the transformation pipeline on the CompositeTransform and observe a 50%-100% performance gain

I can see the benefits of doing this but I'm wary of the fact that it breaks iterator immutability, which can be a source of bugs. I would still try to reap as much as possible by offering a MapFilteringIterator<I, O> whose transform function is allowed to output null, indicating that the I item should be discarded.

Combining all of these factors can result in a speed-up of 100x or greater when doing 50 map/filters on 20000 elements; and similar results when we do smaller numbers of map/filters (e.g. 5).

Indeed, performance-wise this should turn out to be an extremely productive and effective investment of time.

jeswr commented 2 years ago

RangeIterator that appears to run 500x faster than ArrayIterator

Impressive - and I can't see any problems with it; though I wonder why it also seems to be a much greater improvement than using range (i.e IntegerIterator).

I'm wary of the fact that it breaks iterator immutability, which can be a source of bugs

Makes sense - I think there should be ways to give the feel of Immutability to the API (or at the very least throw meaningful errors for devs when they misuse a Mutable AsyncIterator) while using mutations like this under the hood for performance.

This is probably more appropriate in a separate package; but one option I was thinking of is as follows

PlaceholderIterator extends AsyncIterator<never> {
   error() {
       throw new Error('You are trying to use methods on an iterator - which has been destroyed by another transform operation')
   }
   map = this.error;
   read = this.error;
   ...
}
const placeHolderIterator = new PlaceholderIterator();

class ComposeMapFilterIterator extends AsyncIterator<T> {
    map(fn) {
       this.transformers.push(fn);
       const that = this;
       this = placeHolderIterator
       return that;
    }
}

So this means if any potential usage that would cause bugs such as using .map or .read on upstream iterators; the problem immediately gets identified.

Furthermore the build method that I had in the original implementation of this iterator could just be called on the first read.

In fact I'm inclined to argue that such an implementation may lead to fewer bugs in some cases where devs may be trying to read data from an iterator that they shouldn't be.

jacoscaz commented 2 years ago

@jeswr as I was investigating this, I came up with the same idea as https://github.com/RubenVerborgh/AsyncIterator/issues/38#issuecomment-1077584185 and had a go at it in https://github.com/jacoscaz/AsyncIterator/blob/42311b68988a971fc072c896100d7082b4162470/asynciterator.ts#L744-L783 . I'll make a PR later but I just realized we've put quite a lot of things on the table and we should probably make a plan of action, including what should go into asynciterator and what should go in Comunica. That I can think of:

ArrayIterator can be made incredibly faster by something like https://gist.github.com/jacoscaz/e20812991714092dbb20cfd432d1bf00/e6cbf3caab930e88edb62f34562619603572c42b#file-asynciteratortest-ts-L22-L38 . I guess this should be done into asynciterator.
Using dedicated MappingIterator and FilteringIterator would significantly optimize performance whenever .map() and .filter() are used. However, nothing prevents us from adding those two custom iterator into Comunica as opposed to modifying asynciterator. Where should these go? I would say asynciterator.
I'm seeing ~10% increase in performance by switching BufferedIterator to a linked-list (as opposed to the current array-based buffer) as per the links above. Where should this go? I would say asynciterator.
@jeswr is seeing 50x - 100x increase in performance by compositing transformations in https://gist.github.com/jeswr/f49a6975f1bfc088a9c32565e69761bf . I think this would be better off implemented in Comunica as a custom iterator as it deviates slightly from the overall approach used in the library (see immutability) but can see the point in having it upstream too. Where should this go?
Further performance gains might be obtained by optimizing wrap: https://github.com/comunica/comunica/issues/965 . Where should this go? Perhaps in Comunica, as we might want to tailor it to RDF/JS Stream instances.
Lastly, @jeswr did some wizardry with the end events here https://github.com/RubenVerborgh/AsyncIterator/pull/45 . PR already open, this definitely belongs into asynciterator.

Tagging @rubensworks and @RubenVerborgh to keep everyone in the loop.

jeswr commented 2 years ago

ArrayIterator can be made incredibly faster

Indeed - but its also probably worth tracking https://bugs.chromium.org/p/v8/issues/detail?id=12730 in case the V8 team solves this upstream for us. You would also want to delete the source buffer on the end event in your implementation.

I'm seeing ~10% increase in performance by switching BufferedIterator

Nice!

50x - 100x increase in performance by compositing transformations

Standalone it is only approx a ~2x improvement - IIRC the 100x figure came from compounding most of the factors above

but can see the point in having it upstream too. Where should this go?

I'd say this should be a new library of its own e.g. 'asynciterator-compose.js'

jacoscaz commented 2 years ago

I'd say this should be a new library of its own e.g. 'asynciterator-compose.js'

Makes sense!

https://github.com/comunica/comunica/issues/965

In theory, this should be mitigated by #46 although my testing has shown a significantly smaller performance gain than I had expected. @jeswr do you have time to do some testing of that branch on your use cases?

Still, doing away with buffering entirely would be a good idea, given the gains seen in #48 . I think we could port the source wrapping part of TransformIterator into a dedicated WrappingIterator to keep this within asynciterator. How does that sound @RubenVerborgh @jeswr ?

jeswr commented 2 years ago

@jeswr do you have time to do some testing of that branch on your use cases?

I've not done any benchmarking for wrap - I imagine a degenerate case would be wrapping something like the range iterator with a large number of iterations. Such a case, however, is a prime example of where doing away with buffering entirely would be better.

I think we could port the source wrapping part of TransformIterator into a dedicated WrappingIterator to keep this within asynciterator

Sounds good to me

RubenVerborgh commented 2 years ago

I'd say this should be a new library of its own e.g. 'asynciterator-compose.js'

Happy to have it here, really. Could be a separate file though (and maybe we could also split the code into more separate files in general).

jeswr commented 2 years ago

As discussed with @jacoscaz the proposed timeline for resolving this issue is

(1) Get full coverage on current state of #57 . Perhaps minus the SyncUnionIterator for now, and then get the contents of that PR merged into main (likely broken across a few PRs).

(2) Merge #45, because this resolves a lot of issues I had for the SyncUnionIterator as it allows the iterator to enter a done state in a timely manner (i.e. during the first read that produces a null value rather than asynchronously afterwards). The reason I would do this after getting in the majority of performance improvements is that it does include breaking changes.

(3) Address SyncUnionIterator

(4) Address Cloning

Pinging @RubenVerborgh

jacoscaz commented 2 years ago

In addition to the above plan shared by @jeswr , if nobody else needs these changes to live within a single branch for testing purposes I would suggest breaking the work done across the following PRs, to be refined and merged in order:

Faster synchronous transforms, which would include SynchronousTransformIterator and its subclasses MappingIterator, FilteringIterator, SkippingIterator, LimitingIterator, MultiMapFilterTransformIterator, MultiMapFilterIterator and MaybeMappingIterator from #57 . IMHO, these should be evaluated together as they share a common ancestor, itself a new addition to the codebase, and they point to one another via the various map(), filter() and other commodity methods. I would caution against breaking this further into separate PRs.
Faster wrapping, which would include work done on wrap() and the new WrappingIterator from #57 .
End on read, already working in #45
Faster (synchronous) union, already working in #54
Faster cloning, already started in https://github.com/jacoscaz/AsyncIterator/pull/1 (though we might change approach)

When taken all together, these changes will make for dramatic performance gains. I have one use case, which I unfortunately cannot share ATM, where things become 1200x faster simply by switching to 1 + 2 + 4.

RubenVerborgh commented 2 years ago

Thanks for all the work and please proceed, with the following caveats:

45 is a semver.major so anything before that will be the last 3.x, and anything after that will become part of 4.x
I'm not sold yet on SynchronousTransformIterator; I'd need to see a clean version of the code, but things like maybeMap sound like over-optimizations
Given the impact of some changes, I'll need measurable evidence / executable benchmark code for each non-trivial performance PR

Thanks again, @jeswr and @jacoscaz!

jacoscaz commented 2 years ago

I'm not sold yet on SynchronousTransformIterator; I'd need to see a clean version of the code, but things like maybeMap sound like over-optimizations

@RubenVerborgh is it the overall approach that does not convince you or is it the fact that we might have pushed it too far with things like maybeMap? Asking just so I can better direct my efforts. The SynchronousTransformIterator class per-se is just a way to centralize a bunch of code shared by SkippingIterator, MappingIterator and so on. Totally understand if you need a cleaner version of the code to answer this question.

RubenVerborgh commented 2 years ago

is it the overall approach that does not convince you or is it the fact that we might have pushed it too far with things like maybeMap?

At the moment, I can't see the forest for the trees; the code needs a bit of love still. And some bits might take it too far indeed.

jacoscaz commented 2 years ago

A few comments to keep everyone in the loop and for posterity...

While working on 5. Faster cloning, I've realized that my assumptions about the relative performance of Array#push() and LinkedList#push() were wrong. Even when dealing with millions of elements, Array#push() consistently outperforms LinkedList#push(). Furthermore, ClonedIterator#read() already bypasses BufferedIterator's internal buffering, making cloning already far more optimized than I had thought.

@jeswr has also been working on a non-breaking alternative to #54 in #65 for faster unions which also optimizes prepend() and append() (#67).

All this said, I think in order to close this issue and reap the benefits in Comunica we're looking at the following:

Faster synchronous transforms, PR open at #59 and currently being reviewed by @RubenVerborgh
Faster wrapping, PR open at #63 by @jacoscaz
Faster union, PR in WIP at #65 by @jeswr
Release new 3.x version
End on read, PR open at https://github.com/RubenVerborgh/AsyncIterator/pull/45 by @jeswr (required given https://github.com/comunica/comunica/issues/977#issuecomment-1096615201)
Release new 4.x version

Also tagging @rubensworks .

RubenVerborgh commented 2 years ago

. Even when dealing with millions of elements, Array#push() consistently outperforms LinkedList#push()

That is expected; but what we need to know is the behavior for repeated pushes and pulls, cfr. https://github.com/RubenVerborgh/AsyncIterator/issues/60#issuecomment-1088528533

jacoscaz commented 2 years ago

That is expected; but what we need to know is the behavior for repeated pushes and pulls, cfr. https://github.com/RubenVerborgh/AsyncIterator/issues/60#issuecomment-1088528533

As far as my understanding goes and limited to cloning, nothing is ever pulled from the backing buffer as the entire history starting from when the first clone is created (EDIT) is maintained for future cloned iterators to go through; we only care about pushes done at https://github.com/RubenVerborgh/AsyncIterator/blob/fec14f4d19e3d594bc385b927863ebf80a792580/asynciterator.ts#L1845

Nonetheless, I've done some work on a LinkedList-backed HistoryReader here but saw no performance gains whatsoever, which eventually led me to question my assumptions on #push().

RubenVerborgh commented 2 years ago

LinkedList-backed HistoryReader here but saw no performance gains whatsoeve

But that's because no data is ever pulled from it.

jacoscaz commented 2 years ago

But that's because no data is ever pulled from it.

Yes, precisely! What I meant in https://github.com/RubenVerborgh/AsyncIterator/issues/44#issuecomment-1103902591 is that, given that HistoryReader only pushes to its internal buffer while never pulling from it (by design) and given the fact that Array#push() outperforms LinkedList#push() even with millions of items, there's no more reason for me to work on a LinkedList-backed refactoring of HistoryReader, hence my dropping it from the updated list of PRs. Apologies for the lack of clarity.

jacoscaz commented 2 years ago

Update for posterity and to keep track of the general progress on this. Tagging @rubensworks @jeswr @RubenVerborgh .

Open PRs that we're working on, to be reviewed / merged / rebased in order:

Faster synchronous transforms
- ~~https://github.com/RubenVerborgh/AsyncIterator/pull/59 by @jeswr~~ superseded by
- ~~https://github.com/RubenVerborgh/AsyncIterator/pull/75 by @jeswr~~ MERGED!
Faster wrapping
- ~~https://github.com/RubenVerborgh/AsyncIterator/pull/63 by @jacoscaz~~ MERGED!
- ~~https://github.com/RubenVerborgh/AsyncIterator/pull/78 by @jacoscaz~~ MERGED!
Faster union
- https://github.com/RubenVerborgh/AsyncIterator/pull/65 by @jeswr
End on read (required given https://github.com/comunica/comunica/issues/977#issuecomment-1096615201)
- https://github.com/RubenVerborgh/AsyncIterator/pull/45 by @jeswr
Abstract SynchronousTransformIterator class
- https://github.com/RubenVerborgh/AsyncIterator/pull/77 by @jeswr

jacoscaz commented 2 years ago

We're halfway through! Faster unions are next. @RubenVerborgh @jeswr @rubensworks do you think we can get these merged by the end of august?

jeswr commented 2 years ago

@jacoscaz 4 might be a significant amount of work to rebase and check for correctness; so I wouldn't bet on that by end of August. The other two should be much less work but will depend on everyones schedules.

RubenVerborgh commented 2 years ago

My plan is to wrap up AsyncIterator v3.x soonish; the moment we start tackling #45, we're in the v4.x territory. But I guess a v4.x alpha could be possible.

jacoscaz commented 2 years ago

@RubenVerborgh @jeswr perhaps it would make sense to do faster unions and the synchronous transform iterator first, wrapping up 3.x, and leave end on read last. Would that work for you guys?

jeswr commented 2 years ago

I agree with the strategy - will try and wrap up work on the faster unions today

jacoscaz commented 2 years ago

Update for posterity and to keep track of the general progress. Tagging @rubensworks @jeswr @RubenVerborgh .

Open PRs that we're working on, to be reviewed / merged / rebased in order:

non-breaking, to be released in minor 3.x versions:

Faster synchronous transforms
- ~~https://github.com/RubenVerborgh/AsyncIterator/pull/59 by @jeswr~~ superseded by
- ~~https://github.com/RubenVerborgh/AsyncIterator/pull/75 by @jeswr~~ MERGED!
Faster wrapping
- ~~https://github.com/RubenVerborgh/AsyncIterator/pull/63 by @jacoscaz~~ MERGED!
- ~~https://github.com/RubenVerborgh/AsyncIterator/pull/78 by @jacoscaz~~ MERGED!
Abstract SynchronousTransformIterator class
- https://github.com/RubenVerborgh/AsyncIterator/pull/77 by @jeswr

breaking, to be released in 4.0.0:

End on read (required given https://github.com/comunica/comunica/issues/977#issuecomment-1096615201)
- https://github.com/RubenVerborgh/AsyncIterator/pull/45 by @jeswr
Faster union
- ~~https://github.com/RubenVerborgh/AsyncIterator/pull/65 by @jeswr~~ superseded by
- ~~https://github.com/RubenVerborgh/AsyncIterator/pull/79 by @jeswr~~ superseded by
- https://github.com/RubenVerborgh/AsyncIterator/pull/81 by @jeswr

EDITED 2022-08-26: faster unions are now breaking changes, moved to 4.x

jacoscaz commented 1 year ago

I kinda lost track of where things are. Is the v4 PR still being worked on?

jeswr commented 1 year ago

I kinda lost track of where things are. Is the v4 PR still being worked on?

Somewhat fell off my radar for a bit; I do have most of the code for a new version locally but there is still work do be done for the tests.

RubenVerborgh / AsyncIterator

optimization: Reduce overhead of SimpleTransformIterator #44

45 is a semver.major so anything before that will be the last 3.x, and anything after that will become part of 4.x

non-breaking, to be released in minor 3.x versions:

breaking, to be released in 4.0.0: