Gabriella439 / pipes-concurrency

Concurrency for the pipes ecosystem
BSD 3-Clause "New" or "Revised" License
43 stars 12 forks source link

Retrieving channel meta-information #14

Closed quchen closed 6 years ago

quchen commented 10 years ago

I just fixed a nasty bug where my communication channels filled up silently, leading to infinite retries. This lead me to the idea that maybe spawn' should return more than it currently does. A couple of ideas:

These may have to be decorated with Maybes (because channel size for Unbounded is probably unknown), but I just wanted to throw the idea out there. What do you think?

Gabriella439 commented 10 years ago

Can you explain the bug better? How would you trigger infinite retries? Wouldn't that be a bug in the STM runtime?

quchen commented 10 years ago

Oh, sorry if I was unclear. What I'm talking about is not a bug in your library, more of a feature request (allowing to prevent bugs in some cases). The problem is right now Buffers are very opaque, and in particular the channel buffers lack some of the features Control.Concurrent.STM offers. For example, I cannot check whether a channel is full or empty, or peek at a value.

The problem with this is that full channels retry silently. While sometimes this is the desired behaviour, it is an indicator that not enough is read from a channel in other scenarios. In my particular case I used bounded channels with a sufficiently large size to guarantee bounded memory use, while still allowing for "rush hours". One of the channels filled up because of a bug, and weren't read at all. Due to the silent nature of retries, this was quite hard to find; meanwhile, all the other threads writing to the queue retried forever.

In this case, it would have been nice to have a possibility to investigate the state of a channel, for example with a function hasRoom :: Input a -> STM Bool, which checks whether a bounded channel is filled. This could be used to issue a warning in case a channel fills up too quickly.

Similarly, there is no way of putting a value back in a channel in the Pipes module (which is what the usual unGetTChan function does, for example), or to check how many elements are currently in the channel.

I'm now wondering whether it would be a good idea to add such functionality, provided it is possible.

Gabriella439 commented 10 years ago

Yeah, hasRoom makes sense. Also, there should probably be a peek function, too. I'd prefer peek over an unGet, because peek makes more sense in the context of Latest.

It may be a month before I will have time to address this, but if you draft something up before that time then that will speed things along. If you are busy, though, then I'd be happy to do this myself sometime in December.

quchen commented 10 years ago

Since in a bounded channel you have to keep track of the channel size anyway, it might be useful to get this size from the outside. Think of a channel that is prioritized when it is small, but gets less new signals when it fills up.

I agree that peek is better than unGet, for which I can't imagine a non-hacky application.

As for implementing this, I think we're in the same boat right now: I'm finishing my thesis as well. I'll at least edit this post to list all the API features that I imagine could be useful.

Gabriella439 commented 10 years ago

Congratulations on wrapping up your thesis, too! :)

Alright, then I will tackle this more fully in December. Thank you for summarizing things with the feature list.

quchen commented 10 years ago

I just wanted to create a small mockup and almost implemented the thing by accident. Since it's a little too long for the thread here (60 lines), I pasted it in a Gist: https://gist.github.com/quchen/7362643

I was surprised that the functionality can be added as a new "spawn" function without modifying anything else in the library. This new spawn'' function (for the lack of a better name) should be fully compatible with the current library; it's basically the same thing, but with some bookkeeping overhead, as spawn'.

In the same spirit, it should be trivial to implement isFull (which will be available as isFullTBQueue in 7.8, but that would be too much of a dependency) and peek. I guess once we settle on what features should be supported by the bookkeeping spawn I'll write it up nicely as a patch?

Gabriella439 commented 10 years ago

So I think that some of these should be associated with Input and Output themselves. For example, peek should probably be stored within Input alongside send. Similarly, isFull would make sense to store with Output.

I'd have to go back and hide the internal implementation of Input and Output, but that's okay. This would also require seeing if their current typeclass instances make sense with these extensions.

quchen commented 10 years ago

The problem with that is that the bookkeeping overhead would appear in all Input/Ouput objects, which would decrease performance (not by much I think, but still). Unfortunately we cannot use the TBQueue internals to implement my suggestions though - the data is in the TBQueue data type, but the necessary constructors aren't exported.

Gabriella439 commented 10 years ago

Oh. I thought that the underlying data types would already have implemented that. I hadn't taken the time to check.

Also, this brings up a good point that not all Inputs and Outputs should have a notion of size (or peeking). I think you are right to keep this abstraction restricted to a variation on spawn.

Gabriella439 commented 10 years ago

So I was thinking that peek and isFull are probably all we need. peek is basically the side-effect-free counterpart to recv and isFull is the side-effect-free counterpart to send so there is that nice symmetry there. I believe that isFull/peek are probably the only extra functionality we could add that is still Buffer-independent (mainly for the same reason that send and recv are pretty Buffer-independent).

The semantics for the various Buffer types would be:

There are two ways we can implement this. Either we:

I'm leaning towards the latter solution because it has nice symmetry and you can pretty much always reasonably expect that if you define a recv that there is a valid peek action, and similarly that if you can define a send that you can also define an isFull action. This would be a breaking change (therefore warranting a version 2.1.0) for people who defined their own custom Inputs and Outputs not built by spawn, but I would be okay with that since these two extra capabilities seem pretty generally useful and worth asking people to implement. For people who are using spawn this would not be a breaking change and would be totally transparent to downstream code.

What do you think?

quchen commented 10 years ago

I agree with what you said. I'm not sure why there is spawn and spawn' in the first place (when all the first one does is discarding the '), and spawn'' is very smelly API. Furthermore, not having an isFull is by far the most time-consuming (assert-) debugging lack of a feature I've had with STM, so the version bump is well justified from my perspective.

Gabriella439 commented 10 years ago

While we're on the topic, do you think I should include a breaking change to merge the functionality of spawn' into spawn? I figure that if I'm going to make a breaking change that this would be a good time to simultaneously fix this.

Gabriella439 commented 10 years ago

I went ahead an opened up a separate issue to discuss whether or not to merge spawn/spawn'. In the meantime I will begin writing up the code for this issue.

Gabriella439 commented 10 years ago

I just want to update you that I will begin adding these as soon as issue #17 is fixed.

Gabriella439 commented 10 years ago

I'm going to delay this by one release since I need to update several pipes package dependencies today to accept pipes-4.1.0, which I'm releasing today. This is a pretty trivial change to make, but I haven't had a chance to put it through its paces yet to see if there are any obvious problems.

quchen commented 6 years ago

This is now almost 4 years untouched, I’m closing it of old age.