Definition of isOutputRange warped due to "put" implementation

dlangBugzillaToGithub commented 12 years ago

monarchdodra reported this on 2012-07-31T04:22:05Z

Transfered from https://issues.dlang.org/show_bug.cgi?id=8483

CC List

hsteoh
issues.dlang (jmdavis)
yebblies

Description

This is a copy paste from the forum. It was originally written as both an analysis and a proposed solution to a problem.

TL;DR:
"put" allows placing a range of objects inside an output range, which warps the definition of "isOutputRange" The exact kind of errors it creates is this:

----
alias int[] A; alias int[] B;
A a = new int[](2);
B b = new int[](1);
assert(isOutputRange!(B, A));
if(!b.empty)
  b.put(a);
----
Here b is not empty as output range of A object, b is not empty, yet putting a inside b creates an empty range exception.

But more generally, the entire problem lies with the fact that B is NOT an output range of A objects.

Please find the original abstract in the next post.

dlangBugzillaToGithub commented 12 years ago

monarchdodra commented on 2012-07-31T04:22:15Z

**************************************************
I'm really very sorry that this is such a long post, but I'd like 
to make request for a serious change to Phobos, so I think it 
should be as complete as possible.

Thank you to those who have the patience to read it and consider 
it.

**************************************************
Abstract:

  This is a formal request for the deprecation of the the support 
for accepting ranges of elements for the function "put" ie 
(put(OutRange, RangeOfElements)).

  As convenient as this functionality may be is, it undermines 
the very definition of what constitutes an output range, and 
creates some fundamental problems when trying to deal with them.

  put(Range, RangeOfElements) doesn't actually bring anything to 
the table that would be missed.

**************************************************
Explanation:

  The problem is not "put" in and out of itself, but rather that 
it is the fundamental definition of what constitutes an output 
range. In particular, because you cannot extract elements from 
output ranges, and because an output range can potentially 
support multiple types, you cannot write:
  "ElementType!OutputRange" => this will yield "void" on a 'true' 
output range. Rather, you have to individually query if the range 
is an output for specific types:
  "isOutputRange!(OutputRange, SomeElement)"

  The catch is that the definition of "isOutputRange" is just 
"does "put(range, element)" compile". This means that 
essentially, as soon as a range is an output for elements, it 
becomes an output for ranges of elements, for ranges of ranges of 
elements, for ranges of ranges of ranges of elements, for ...

For example: int[] is an outputRange of int (obviously), but it 
is also defined as an output range of int[], int[2], and 
int[][]...
This is clearly not right.

**************************************************
Problems put creates for template restrictions:

  At it simplest, it prevents any algorithm from properly working 
with outputRanges. This is the definition of "copy":

Range2 copy(Range1, Range2)(Range1 source, Range2 target)
  if (isInputRange!Range1 && isOutputRange!(Range2, 
ElementType!Range1))

See the trap? This is perfectly legal code:
  int a[][][][];
  int b[];
  copy(a, b);

  Look bad? it gets worse. Imagine the function fill. It works 
the same as "copy", but it can also take an "element" as an 
argument. One would be tempted to write this pair of signatures:

void fill(Range1, Range2)(Range1 target, Element filler)
  if(isOutputRange(Range1, Element))

void fill(Range1, Range2)(Range1 target, Range2  filler)
  if(isInputRange!Range2 && isOutputRange(Range1, 
ElementType!Range2))

You can try to write this, but if you do, the code will never 
work. ANYTHING that matches the range fill, will also match the 
element based fill, since the target range supports "put" from a 
range... For example:
int[2] a = [1, 2];
int[] b = new int[](8);
fill(b, a); //ambiguous call...
//Are you copying a "range" or an "element"?
//Answer: Who know! Honestly: if b is an output range of a, WHICH 
IS THE RIGHT CALL?

  The only way to really avoid this problem (as is currently done 
in algorithm.d) is to add: "is(typeof(range.front = filler))", 
but this defeats the very requirement of outputRange (outputRange 
does not have front), and bumps the algorithm's requirements up 
to "inputRange".

  Long story short, it is not currently possible to write 
templates that reliably support an outputRange restriction.

**************************************************
Problems put creates for implementation:

  The big problem about put is that it makes outputRanges _lie_ 
about what they support. Here is an example at its simplest:

alias int[] A; alias int[] B;
A a = new int[](2);
B b = new int[](1);
assert(isOutputRange!(typeof(a), typeof(b)));
if(!b.empty)
  b.put(a);

  In this code:
*b is an output range of A.
*b is not empty.
*putting an A inside b creates a empty range exception !?

  While this example might look "cute", it is also unacceptable 
behavior. if b is a non-empty range that supports "elements A", 
then it _HAS_ to be able to support a put. In this case, b 
clearly does not support elements of type A.

  I'm sure you can imagine the kinds of problems this can caue 
for a template developer.

**************************************************
Why we don't need put(Range):

  Quite simply: because we have the higher order function. The 
convenience "put(Range)" is nothing more than a glorified 
algorithm. Instead of using put, the user should make a call to 
copy. "copy" is designed specifically for copying an input range 
into an output range. Why duplicate this functionality into put? 
Calling copy is much more honest and accurate about what is 
actually happening.

**************************************************
Problems regarding removing put(Range):

  In theory, the only problem this would create is breaking code 
which can be legitimately replaced by “copy” without any 
change.

  It *could* also break some templates, as some ranges would 
seize reporting themselves as outputRanges of Ranges, but, quite 
frankly, I think those algorithms are unsafe to begin with, and 
are operating on incompatible types.

**************************************************
Conclusion:

  I think I've shown the problems that put(Range) creates. 
Problems that are simply insurmountable. On the other hand, 
"e.put(range)" brings nothing that can't be rewritten as "copy(e, 
range)". For all these reasons, I think it would be best to get 
rid of it.

dlangBugzillaToGithub commented 10 years ago

hsteoh commented on 2014-08-09T15:03:54Z

In light of recent realizations that output ranges are really only useful with specific operations at the end of UFCS chains, such as std.algorithm.copy or std.format.formattedWrite (and arguably, the latter could be rewritten to return an input range instead), I'm wondering if we should just get rid of output ranges altogether and just have .copy be the one-stop function for implementing data sinks.

dlangBugzillaToGithub commented 10 years ago

monarchdodra commented on 2014-08-09T18:28:36Z

(In reply to hsteoh from comment #2)
> In light of recent realizations that output ranges are really only useful
> with specific operations at the end of UFCS chains, such as
> std.algorithm.copy or std.format.formattedWrite (and arguably, the latter
> could be rewritten to return an input range instead), I'm wondering if we
> should just get rid of output ranges altogether and just have .copy be the
> one-stop function for implementing data sinks.

The two issues with that are:
1. Copy is "Range to ouput sink". So you can't just do: "copy(1, myIntegerOutput)"
2. Copy is implemented on top of output range definition and the "put" primitive.

Unless you had something else in mind in terms of "one-stop function for implementing data sinks"? I mean (IMO), I see "copy" as pretty much the same thing as "put", but with reversed args, making it UFCS friendly.

I had honestly wondered about adding a "putInto" into phobos, which is basically just out reversed:
5.square().putInto(myOuputRange);

dlangBugzillaToGithub commented 10 years ago

issues.dlang commented on 2014-08-10T03:37:46Z

(In reply to hsteoh from comment #2)
> In light of recent realizations that output ranges are really only useful
> with specific operations at the end of UFCS chains, such as
> std.algorithm.copy or std.format.formattedWrite (and arguably, the latter
> could be rewritten to return an input range instead), I'm wondering if we
> should just get rid of output ranges altogether and just have .copy be the
> one-stop function for implementing data sinks.

Are there discussions on this? I have seen no discussions or evidence that there's anything fundamentally flawed with output ranges and put. There's the problem with std.range.put and UFCS, but that can be fixed by creating a different function for a range to implement for std.range.put to use rather than implementing a function call put. What issues are we talking about here that make output ranges flawed?

- Jonathan M Davis

dlangBugzillaToGithub commented 10 years ago

hsteoh commented on 2014-08-10T04:04:33Z

I didn't say anything was *flawed* with output ranges. I just said that they are not *as* useful because, being sinks, any code that uses them will terminate the processing chain, making it non-composable with any subsequent data processing. IME, I've also found that code that takes output ranges as parameters tend to be limited in reusability, and tends to uglify downstream code since you have to store the data somewhere, or use hacks to allow further processing of data, etc..

In fact, all cases of output range usage that I encountered while writing range-based code can be more profitably refactored to return input ranges instead, and use std.algorithm.copy for when I actually want to use an output range. This makes the majority of the code composable, and therefore much more reusable, than if I had used output ranges everywhere. Basically, anything that looks like func(inputRange, outputRange) can be rewritten as newFunc(inputRange).copy(outputRange), where newFunc returns an input range instead of writing the results to an output range.

So, since anything that takes an output range can always be rewritten as something that returns an input followed by std.algorithm.copy, it follows that the usefulness of output ranges seems to be restricted to std.algorithm.copy. Being such, then, it seems that the logical next step is to eliminate the concept altogether, and just have std.algorithm.copy implement all the semantics of output ranges.

dlangBugzillaToGithub commented 10 years ago

issues.dlang commented on 2014-08-10T06:08:58Z

Hmmmm. Well, that's certainly food for thought. Output ranges are definitely for when you're writing out the final output, not for composability. However, in many of the cases where an output range would be used, copy wouldn't make sense, because the destination is really output (e.g. the console or a file) and not a new input range. So, I question that it makes sense to try and replace output ranges with copy, though I concur that in many cases where you might use an output range, it makes more sense to use an input range and then feed that into an output range if that's what you want.

Regardless, I think that it's something that merits a larger discussion in the newsgroup.

The one thing that I keep meaning to bring up with output ranges is the lack of ability to determine whether they're full (which becomes even more complicated when dealing with stuff like chars where you can't know ahead of time whether it's going to fit thanks to different encodings), but it's certainly never occurred to me that copy could be thought of as a replacement for output ranges.

dlangBugzillaToGithub commented 10 years ago

monarchdodra commented on 2014-08-10T07:54:19Z

In a nutshell, I have issues with an input range being able to be an output range, because there is no way to determine if it is "full".

Heck, even if we could, I resent the notion that an output range could be "full" at all!

Honestly, I wish we had never mixed the notion of (Input)Ranges with output ranges. IMO, we should have had:
-InputRange
-InputRange && hasAssignableElements (writeable output ranges)
-Sinks (like "writeln", or "container.put")

And to be honest, I've almost never seen anyone use the OutputRange interface on an input Range. The only case I know of is "copy". And I don't think it should.

dlangBugzillaToGithub commented 10 years ago

yebblies commented on 2014-08-10T08:07:51Z

(In reply to hsteoh from comment #2)
> In light of recent realizations that output ranges are really only useful
> with specific operations at the end of UFCS chains, such as
> std.algorithm.copy or std.format.formattedWrite (and arguably, the latter
> could be rewritten to return an input range instead), I'm wondering if we
> should just get rid of output ranges altogether and just have .copy be the
> one-stop function for implementing data sinks.

std.format.formattedWrite is the best thing ever.

dlangBugzillaToGithub commented 10 years ago

issues.dlang commented on 2014-08-10T08:17:30Z

(In reply to monarchdodra from comment #7)
> In a nutshell, I have issues with an input range being able to be an output
> range, because there is no way to determine if it is "full".
> 
> Heck, even if we could, I resent the notion that an output range could be
> "full" at all!

Arrays are input ranges, and calling put on them fills them rather than appending to them, so yes, they can get full. And given that it's likely not all that uncommon to be dealing with fixed size buffers, I would expect that there would be plenty of cases where you'd want to write to something using ranges, and it could get full.

> Honestly, I wish we had never mixed the notion of (Input)Ranges with output
> ranges. IMO, we should have had:
> -InputRange
> -InputRange && hasAssignableElements (writeable output ranges)
> -Sinks (like "writeln", or "container.put")
> 
> And to be honest, I've almost never seen anyone use the OutputRange
> interface on an input Range. The only case I know of is "copy". And I don't
> think it should.

I concur. While it might make sense for something to operate as both an input range and an output range (e.g. fill it as an output range and then empty it as an input range), I think that having front work as a way to write to a range rather than it having to having something specific to output ranges was a mistake.

Regardless, the simple fact that the closest thing that we have to a built-in output range (arrays) can get full and yet the output range API does not take that into account is a major indicator that output ranges need at least a minor overhaul if not a major one.

dlangBugzillaToGithub commented 10 years ago

hsteoh commented on 2014-09-22T04:09:27Z

Would it make sense to extend the definition of output range to include a .full method?

dlangBugzillaToGithub / migration_test