typescript porting question

ziriax commented 8 years ago

Mainly for fun and to understand the code-base in depth, I started porting the c# version of sodium to Typescript.

Now, the Node class uses a WeakReference to the action.

Unfortunately Javascript does not have weak refs yet... The ES6 WeakMap is also not suitable for implementing weak even listeners.

https://esdiscuss.org/topic/weak-event-listener

Would it be ok to require an explicit unlisten call to cleanup memory, as in RX?

jam40jeff commented 8 years ago

I believe this would open the door to memory leaks. I am actually not a fan of exposing ListenWeak in the public API (interestingly enough, it's only exposed for Stream<T> and not for Cell<T>) because I think users should always have to call Unlisten (or Dispose in C# will have the same effect) to deregister the listener.

However, the problem is with listeners created by the primitives themselves. The primitives return a new Stream<T> or Cell<T> which may be disposed, which will in turn deregister the implicit listener they created. However, these primitives are often chained, which means user code never gets a reference to them and they go out of scope very quickly. The only thing holding a reference to them is the listener, stream, or cell at the end of the chain. When the object at the end of the chain is freed up by calling Unlisten, the garbage collection is free to flow up the chain until we reach another stream or cell that is being held on to by user code because of the WeakReferences.

Here is an example of where I believe there would be a problem if WeakReference was not used:

public class Test
{
    private readonly Stream<int> stream = Stream.Never<int>();
    private IListener listener;

    public void StartListenTest()
    {
        this.listener = stream.Map(_ => Unit.Value).Listen(_ => Console.WriteLine("It fired!"));
    }

    public void StopListenTest()
    {
        this.listener.Unlisten();
    }

    public static void RunTest()
    {
        Test test = new Test();
        test.StartListenTest();
        test.StopListenTest();
        // At this point, test.listener should be cleaned up, but the listener created by the call to Map
        // will not be unless it was created with a WeakReference because test.stream will still hold
        // a reference to it.  There is no way to deregister that listener until test.stream (and thus,
        // test itself) goes out of scope.
        Thread.Sleep(5000);
        // now everything will get cleaned up (eventually when the GC gets to it) when this method
        // exits and the test variable goes out of scope
    }
}

The only way I can see where this would work is if users were forced to dispose of all intermediate streams and cells returned by primitives as follows:

public class Test
{
    private readonly Stream<int> stream = Stream.Never<int>();
    private Stream<Unit> mappedStream;
    private IListener listener;

    public void StartListenTest()
    {
        this.mappedStream = stream.Map(_ => Unit.Value);
        this.listener = this.mappedStream.Listen(_ => Console.WriteLine("It fired!"));
    }

    public void StopListenTest()
    {
        this.listener.Unlisten();
        this.mappedStream.Dispose();
    }

    public static void RunTest()
    {
        Test test = new Test();
        test.StartListenTest();
        test.StopListenTest();
        // At this point, everything will be cleaned up, but we have disallowed chaining of primitives
        // and forced users to track when the intermediate streams and cells are not in use
        // anymore and are available for disposal.
        Thread.Sleep(5000);
        // everything has already been cleaned up
    }
}

However, I believe this would place undue burden on the users and would lead to unexpected memory leaks when they do not adhere to this unreasonable requirement.

The way the code currently works (with the usage of WeakReference), the intermediate Stream<Unit> in the first example is available for GC as soon as this.listener.Unlisten(); is called.

ziriax commented 8 years ago

I don't know yet how Sodium works yet internally, but in Reactive Extensions, if you have code like:

stream.Map(_ => Unit.Value).Listen(_ => Console.WriteLine("It fired!")).UnListen()

then the tail Unlisten call will also unlisten from the original stream.

So your code in Reactive Extensions would not leak memory, even though no weak references are used here.

I will have to look at the code of Map and the other operators to understand why this is not the case in Sodium.

I guess this is because in Sodium (as in Flapjax) all signals are 'hot', so stream.Map immediately starts listening to stream. While in RX, stream.Map merely builds a new IObservable that will start listening to its input whenever someone listens to its output, and it will stop listening as soon as the output is not observed anymore. This is actually a simplified version: an operator like Map is an observable that when subscribed too will subscribe to its input, and return that disposable subscription. So RX does not have a Map 'node' that can have multiple listeners, it always delegates listening to its input. The 'node' is created on the fly when you call subscribe.

jam40jeff commented 8 years ago

You are correct that the "node" is not created until the observable is observed in Rx. In Sodium, the intermediate primitives begin listening right away. I believe this may be necessary for the topological sort which allows for the glitch-free merge implementation, but I am not sure. @the-real-blackh could probably give you a more definitive answer.

jam40jeff commented 8 years ago

FWIW, I just ran a test using Rx and it does in fact seem as if the intermediate operators are independent, which would not allow for a glitch-free implementation.

Given the following code:

[Test]
public void TestTwoListeners()
{
    Subject<int> s = new Subject<int>();
    IObservable<int> mappedS = s.Select(v => 2 * v);
    List<int> @out = new List<int>();
    IDisposable s1 = mappedS.Subscribe(@out.Add);
    IDisposable s2 = mappedS.Subscribe(@out.Add);
    s.OnNext(2);
    s.OnNext(7);
    s1.Dispose();
    s2.Dispose();
}

The code in the Select operator will run twice for each call to OnNext on the subject s because there are two subscriptions.

In Sodium, the two listeners would share the node created by Map, but this requires a "downward" reference (a WeakReference in the Java and C# implementations). Without WeakReference, your best bet may be to do manual "reference" (actually "listener") counting, but there would still be a couple "gotcha" situations with that as well.

ziriax commented 8 years ago

In RX, you must do the sharing explicitly, like this:

[Test]
public void TestTwoListeners()
{
    Subject<int> s = new Subject<int>();
    IObservable<int> mappedS = s.Select(v => 2 * v).Publish().RefCount();
    List<int> @out = new List<int>();
    IDisposable s1 = mappedS.Subscribe(@out.Add);
    IDisposable s2 = mappedS.Subscribe(@out.Add);
    s.OnNext(2);
    s.OnNext(7);
    s1.Dispose();
    s2.Dispose();
}

x.Publish() only makes a single subscription to x, keeping its own list of subscriptions.

x.RefCount() does referencing counting on published subscriptions, releasing the input subscription when the last output subscription is disposed, what is exactly what you said :)

jam40jeff commented 8 years ago

Implementing similar reference counting logic would seem to work for discrete streams and cells which are "observed" using Listen, but I am not sure if it would suffice for the "continuous" case where a cell is sampled at a time interval. (Would each call to Sample increment and then decrement the reference count right away? If the reference count is varying between 0 and 1 on short time intervals, causing primitives to be detached and reattached, I would think this could have a negative performance impact.)

ziriax commented 8 years ago

Sample is the pull interface I guess, where Listen is the push?

Why not have a Sampler object that increments the refcount and releases when disposed? So instead of Sample returning a value, it returns a Sampler object that allows sampling and can be disposed.

That looks a lot to .NET's IEnumerable and IEnumerator of course...

jam40jeff commented 8 years ago

I like that idea. It would permit a more deterministic deregistration of listeners even in other languages as well as the ability to make other optimizations such as not running the logic for the intermediate primitives unless a downstream listener or sampler is attached.

However, I would like to hear @the-real-blackh's opinion on this as he knows the ins-and-outs of the Sodium code much better than I do.

the-real-blackh commented 8 years ago

One of the reasons why weak references are needed is the situation where you have a Stream or Cell waiting in some logic that will some time in the future output a Cell<Stream<A>> or Cell<Cell<A>> that is then switched. It has no listeners, but it - and the chain of other Stream/Cells that lead to it - need to be kept alive just from a reference to it held inside a lambda. This can only be tracked through the language, as it were, not directly.

So, I can't see any way around the requirement for weak references.

ritschwumm commented 8 years ago

btw, weak refs can lead to interesting problems when your dependency graph changes often and the garbage collector does not remove obsolete nodes fast enough so they keep calculating useless stuff for longer than necessary.

the-real-blackh commented 8 years ago

ritschwumm: This would cause performance degradation in Sodium but no incorrect answers, so no "problems" as such.

ritschwumm commented 8 years ago

@the-real-blackh so i thought. still annoying when you run into it. seems like a 'real' solution would require a different garbage collector ala http://tomasp.net/academic/papers/hollywood/hollywood.pdf .

ziriax commented 8 years ago

So higher order signal trouble, it keeps biting me ;-)

Could you give an example? It would help me understand this better.

Because weak references are not part of Javascript yet, it does not make sense to port Sodium to Typescript.

But maybe we don't need real weak references to implement SodiumJS, but just WeakMap. But I doubt it, and to be honest I don't understand WeakMap yet, it feels like a strange beast, since it are the keys that are weak, not the values...

jam40jeff commented 8 years ago

@the-real-blackh, I may not be following, but it seems like "something" in scope would have to hold a reference to the switched stream or cell, right? If so, it would be kept alive by that reference (and by virtue keep all of its upstream streams and cells alive). I believe the only problem the weak reference solves is preventing the upstream streams and cells from keeping downstream ones alive. This could still be done manually through listener/sampler counting if that link is broken once the count reaches 0 and reattached when it goes above 0. This would mean the upstream stream or cell would no longer keep the switched one alive, but any other reference still could (and the refcounting logic would mean it still could "reattach" if something listens to it again).

I could try to code up an example in a branch and if you have a specific problem scenario in mind I could test it against that. At the very least, I'd like to figure out what I'm overseeing if I'm wrong.

the-real-blackh commented 8 years ago

Cell<A> a1 = ...;
Cell<A> a2 = ...;
Stream<Unit> sChange = ...;
Cell<A> a = Cell.switchC(sChange.map(u -> a2).hold(a1));

In this example we'd need to make sure the lambda u -> a2 kept a2 (which could be a whole chain of FRP logic) alive.

I don't know from direct experience of WeakMap in Javascript, but I think it can be done with WeakMap. All we need to do is 1. track whether any references to something exists, and 2. run a finalizer when no more references to it exist.

Give each Cell/Stream a unique ID.
Store a 'tracker' object containing the unique ID as the key of the WeakMap.
Also use the unique ID alive in a normal map called 'finalizers' mapping to lambdas that clean things up.
Retain the 'tracker' object inside the Cell/Stream so that whenever it is referenced, the tracker is kept alive in the WeakMap.
Periodically compare the 'trackers' weak map against the 'finalizers' map. If any key exists in 'finalizers' but not 'trackers', then run the finalizer code and remove the entry from 'finalizers'.

A potential problem is that if there is a long chain of these, it could take many repeats of this to clean everything up. However, this can be fixed by calling these 'implicit trackers' and also having explicit ones not based on WeakMap for the more common case where we can track the reference directly. You'd have to deal with cycles. That could be done either by using both kinds of trackers at once, or by explicit cycle detection. Both schemes would work. The first is probably easiest.

the-real-blackh commented 8 years ago

I've just realized that the 'potential problem' above is not a problem as long as trackers reference their dependents directly and not through finalizers.

ritschwumm commented 8 years ago

i don't see how you could compare the two key sets - there's no way to iterate over the keys or entries of a WeakMap.

the-real-blackh commented 8 years ago

That could be a show-stopper.

jam40jeff commented 8 years ago

In the example, the lambda should keep a2 alive through the closure. The closure should be kept alive by the result of map. The result of map should be kept alive (eventually) by switchC. The result of switchC should be kept alive by a.

the-real-blackh commented 8 years ago

Yes, that's right.

ziriax commented 8 years ago

So, does that mean that the referencing counting approach might be worth trying?

I hope it doesn't give problems with feedback loops... Cyclic references...

the-real-blackh commented 8 years ago

Unfortunately reference counting isn't going to work because in the example I gave...

Cell<A> a1 = ...;
Cell<A> a2 = ...;
Stream<Unit> sChange = ...;
Cell<A> a = Cell.switchC(sChange.map(u -> a2).hold(a1));

...lambdas are opaque, so there's no way for the reference counting algorithm to introspect to see whether the lambda refers to a2 or not.

the-real-blackh commented 8 years ago

Maybe I'm missing something, but as far as I can see, "true" FRP can't be implemented in Javascript as long as there's no finalizer capability. This is worth lobbying Javascript standards people over, IMO.

The only way of doing it is to do everything through a preprocessor so that the contents of closures can be introspected. So, it could be done in some other language that translates to Javascript. ...but it couldn't be freely mixed with Javascript code. Specifically, Javascript code won't be able to hold references to FRP objects.

jam40jeff commented 8 years ago

@the-real-blackh, I should be using a different terminology than "reference counting"; maybe "listener counting". In your example, the lambda would prevent a2 from being garbage collected. The fact that there are no listeners means that the "listener counting" algorithm will disconnect a2 from its upstream parents so they are not the cause of keeping a2 alive in the case that the lambda goes out of scope (which can only happen here if a goes out of scope). However, a2 will be reattached as soon as its "listener count" goes above 0, which will happen when it is switched to by switchC.

The challenge I see is that cells which have no listeners will detach from their upstream parent. They will later reattach if needed when a listener is added, but this would require the Stream -> Cell transition (only hold I believe?) to be able to "replay" the last value from the upstream Stream object since it wasn't listening while its "listener count" was 0 and doesn't have an up-to-date current value.

This paradigm would mean that there are two different ways of keeping an object from being garbage collected:

(1) Attach a listener to it. (2) Keep the object alive through a reference or a chain of references in user code. (This can be through a chain of FRP primitives, such as in the example above, where a2 is eventually kept alive by the fact that user code has a reference to either a or any listener of a.)

If neither of these conditions are satisfied, the object ceases to be useful (nothing is listening to it and it can't be listened to by anyone else since there is no reference to it anymore in user code) and it should be available for garbage collection.

the-real-blackh commented 8 years ago

I see exactly what you mean! You're onto something, and maybe this can be solved.

I agree that tracking holds should be the only issue here, and since we only need the last value, maybe it can be solved by making all streams remember one previous value (as you say). This would mean that the implementation would become a push/pull model, but that is fine.

We still have to solve cycle collection, but we have all the tracking information we need, so it can be done. Your insight means that the Javascript problem and the C++ problem (which I'm trying to solve) are the same.

For the cycle collection, I think somewhere in this paper is the algorithm we need:

http://researcher.watson.ibm.com/researcher/files/us-bacon/Bacon03Pure.pdf

the-real-blackh commented 8 years ago

Actually the Javascript problem and the C++ problem may not be quite the same because there's no garbage collector at all in C++ and so the reference held inside a closure has to be managed by reference counting and so it could form part of a cycle. (I could be wrong about this.)

At any rate, in Javascript this would always be outside the cycle so we can ignore it.

ritschwumm commented 8 years ago

wait a minute. isn't disconnecting-from-upstream a problem for all stateful combinators?

the-real-blackh commented 8 years ago

hold is the only stateful combinator.

ritschwumm commented 8 years ago

oh.. i'd have expected a combinator like ((S,T) => (S,U)) => S => Stream[T] => Stream[U] that advances an internal state S for each new T and additionally emits an U calculated from the previous S and the T. could this be implemented in terms of hold?

the-real-blackh commented 8 years ago

Those ones (such as accum) are written in terms of hold. Actually lift is another combinator that would be affected by these issues but the solution is no different to what we'd do with hold.

ziriax commented 8 years ago

How does David Sankel solve this in his SFRP C++ library?

I think he also has a denotational semantic

Or maybe he has memory leaks, could be.

I see in your DS that switch always trims the signal it is switching into. Doesn't that make switching very expensive? Because it means that all the signals you will eventually switch in to are all running. It would also mean that if I inline the a2 in your example, that I would get different behavior. Doesn't that mean that referential transparency is broken? Or should I read this code in a monadic way, and inlining creates a completely different monad, and so referential transparency is not broken at all. And by creating a signal in a lambda you can also make switching efficient. That does make sense, the construction of a signal also starts it, just like the JavaScript interval function also starts (but that is IO monadic ofcourse so your code must run all in a monad, and I just answered my own question)

But indeed that approach seems to require a garbage collector...

I might be completely wrong, just trying to understand. I will experiment more with the library to get a better idea of it.

ziriax commented 8 years ago

Flapjax is written in Javascript and has a similar semantic I think. So I am curious how they tackled this problem

HeinrichApfelmus commented 8 years ago

@jam40jeff I'm afraid it's not that simple. If you want to discard a2, then you have to make sure not just that no one is listening to it, but that no one will ever be able to listen to it again. Otherwise, you have to update the state inside a2 as usual, because it's not possible to reconstruct the value once you've discarded it. To fulfill the latter condition, you need support from the language runtime in the form of garbage collection.

@Ziriax I have taken a cursory look at Flapjax, and from the description in their paper, it appears to me that their combinator for merging event streams (I like to call it unionWith)) is not deterministic. I hope I'm not wrong here, but this essentially brings it to the same level as the React* family of libraries.

That said, the issue with garbage collection is probably orthogonal to the deterministic merge problem. I have no idea what Flapjax does there.

ziriax commented 8 years ago

Regarding WeakMap

The WeakMap is intriguing. I can see how it helps to associate private data to say a DOM node n. First private data is associated to node n by using weakmap.set(n, data). Then you attach an event handler to n, and that handler will lookup the data from the weakmap. If n is deleted, and no further references exist to n, the weakmap will delete the key-value automatically. This is very useful to extend immutable objects with data without leaks. Somehow it feels that this should be enough to solve the problem here.

Regarding non-determinsm

As far as I understand it, the garbage collector now also introduces another form of non-deterministic behavior, because signals are only stopped when garbage collected? So a composite signal like let s = windowAnimationFrameEvent(moment).map(gameLevelState) might perform the potentially very expensive computation gameLevelState for a long time until s is garbage collected? And the theoretical windowAnimationFrameEvent will unsubscribe from the external animationFrame event in its finalizer?

I mean non-determinism not in the form of results being computed, but computational resources being wasted.

So FRP makes not only the space garbage collection non-deterministic, but also the time. And somehow writing it like this does make sense, because if garbage collection would be instantaneous, both the space and time non-determinism would go away. We need hardware garbage collection :)

jam40jeff commented 8 years ago

@HeinrichApfelmus I came to that conclusion last night after implementing the attach/reattach logic for streams, unfortunately. For cells, the entire upstream history must always be stored since some combinators (such as accum) need to replay all missed values rather than just the latest.

@Ziriax I think this means that there is no reasonable solution to this problem other than using weak references (which allows garbage collection to notify of when to stop considering downstream primitives) and always flowing values downstream even if there is currently no listener at the end of the chain (so we don't need to replay all missed values on reattachment). In other words, the way the Java and C# versions are already doing it seems to be optimal.

ziriax commented 8 years ago

Could the problem be solved by introducing a Seq operator that has the following semantics:

x.Seq(y) means that whenever x is started (listened to), y is also started.

This would change the semantics of the FRP system of course...

So Stephen's example:

Cell<A> a1 = xxx...;
Cell<A> a2 = yyy...;
Stream<Unit> sChange = zzz...;
Cell<A> a = Cell.switchC(sChange.map(u -> a2).hold(a1));

Would become

Cell<A> a1 = xxx...;
Cell<A> a2 = yyy...;
Stream<Unit> sChange = zzz...;
Cell<A> a = Cell.switchC(sChange.seq(a2).map(u -> a2).hold(a1));

Without the seq, the behavior of the code snippet would be the same as if a2 was inlined in the lambda: u -> yyy..., meaning it would only start as soon as sChange fires, not when a is listened to.

Maybe this kills all of the semantics, but if not, I think it might allow reference counting, but it's just an intuition...

ziriax commented 8 years ago

Maybe co is a better name than seq, as it has a different meaning than Haskell's seq

So x.co(y) means y is started resp stopped together with x at the first resp last listen

jam40jeff commented 8 years ago

This would kill the semantics. Any cells in the chain would miss events when there are no listeners, so they will hold an invalid value if anyone attempts to listen on them at a later point.

At first I thought this could be solved by replaying the last event on listen, but I didn't take into account accumulators or filters, which would require replaying more than just the last value to ensure the current cell value is reconstructed accurately.

ziriax commented 8 years ago

Indeed, mea culpa.

This is unfortunate, so an Ecmascript version will have to wait until weak refs with finalizers are supported.

jam40jeff commented 8 years ago

In the C# version, I do not use finalizers. I only remove a node from its parent the next time the parent stream fires (and finds a stale WeakReference).

IDisposable is also implemented for streams and cells so user code can force immediate cleanup.

the-real-blackh commented 8 years ago

@jam40jeff For cells, the entire upstream history must always be stored since some combinators (such as accum) need to replay all missed values rather than just the latest.

I fear that you are correct, and this means it can't be done in Javascript now. Remembering all previous updates is not going to work, because the objective is to prevent memory leaks.

The C# way you describe is of course fine but is effectively the same as a finalizer so it doesn't help.

The problem is to track what is referenced and what isn't. The 'co' operator would sort of work, but it would place an unpleasant burden on the writer of the code.

Cell<A> a = Cell.switchC(sChange.co(a2).map(u -> a2).hold(a1));

I only spent about 30 minutes reading David Sankel's code. I don't think he deals with this problem. I also don't think Flapjax does either. I found out from using it that Rx.JS suffers from the same leak (no surprise there).

In the v2 of sodium for C++ I am attempting to use some black magic called MagicRef to track what a std::function refers to. Basically I run all copy constructors under the control of some global variables that track what reference counts are altered when the std::function is copied. The assumption is that all reference counting variables are MagicRefs and never std::shared_ptrs. Whether this works or not will be dependent on the implementation of the compiler. If std::function uses std::shared_ptr or equivalent internally, it won't work. This trick does work in GNU C++.

ziriax commented 8 years ago

An ambitious project would be to fork the Typescript compiler so that it generates the required code. After all, the TS compiler parses all functions and lambda's, so it could track all references to signals. Not sure if this is doable...

the-real-blackh commented 8 years ago

I don't see any reason why modifying the Typescript compiler in this way wouldn't work, but I think it's fairly ambitious.

Could the '.co' thing be made to work? I can't see why not. All our dependencies are trackable except for the contents of lambdas. If it's possible for the FRP code writer to explicitly tag these dependencies, then it should solve it, as long as the person does it correctly. It's only an issue in relatively rare cases, and where the user fails to type it, in many cases the code will fail 100% of the time. Another possible name for it is '.dep' (for 'depends-on').

EDIT: We might even be able to explicitly detect where the person has omitted to use it and throw an error message.

EDIT 2: I like this idea! I will try it out in the C++ implementation. MagicRef is a bit scary.

In the meanwhile, we lobby Javascript standards bodies.

ziriax commented 8 years ago

Good luck, I don't understand enough of the code-base yet to know if it will work :)

ziriax commented 8 years ago

According to someone at Reflex FRP, GHCJS has its own system for tracking weak-references...

the-real-blackh commented 8 years ago

In case anyone following this thread has missed it, I've implemented the scheme of explicitly declaring dependencies in lambdas in the Typescript version of Sodium and it seems to actually work! https://github.com/SodiumFRP/sodium-typescript

jam40jeff commented 1 year ago

Closing as this has been implemented in TypeScript.

SodiumFRP / sodium

typescript porting question #85