Open dlangBugzillaToGithub opened 12 years ago
**************************************************
I'm really very sorry that this is such a long post, but I'd like
to make request for a serious change to Phobos, so I think it
should be as complete as possible.
Thank you to those who have the patience to read it and consider
it.
**************************************************
Abstract:
This is a formal request for the deprecation of the the support
for accepting ranges of elements for the function "put" ie
(put(OutRange, RangeOfElements)).
As convenient as this functionality may be is, it undermines
the very definition of what constitutes an output range, and
creates some fundamental problems when trying to deal with them.
put(Range, RangeOfElements) doesn't actually bring anything to
the table that would be missed.
**************************************************
Explanation:
The problem is not "put" in and out of itself, but rather that
it is the fundamental definition of what constitutes an output
range. In particular, because you cannot extract elements from
output ranges, and because an output range can potentially
support multiple types, you cannot write:
"ElementType!OutputRange" => this will yield "void" on a 'true'
output range. Rather, you have to individually query if the range
is an output for specific types:
"isOutputRange!(OutputRange, SomeElement)"
The catch is that the definition of "isOutputRange" is just
"does "put(range, element)" compile". This means that
essentially, as soon as a range is an output for elements, it
becomes an output for ranges of elements, for ranges of ranges of
elements, for ranges of ranges of ranges of elements, for ...
For example: int[] is an outputRange of int (obviously), but it
is also defined as an output range of int[], int[2], and
int[][]...
This is clearly not right.
**************************************************
Problems put creates for template restrictions:
At it simplest, it prevents any algorithm from properly working
with outputRanges. This is the definition of "copy":
Range2 copy(Range1, Range2)(Range1 source, Range2 target)
if (isInputRange!Range1 && isOutputRange!(Range2,
ElementType!Range1))
See the trap? This is perfectly legal code:
int a[][][][];
int b[];
copy(a, b);
Look bad? it gets worse. Imagine the function fill. It works
the same as "copy", but it can also take an "element" as an
argument. One would be tempted to write this pair of signatures:
void fill(Range1, Range2)(Range1 target, Element filler)
if(isOutputRange(Range1, Element))
void fill(Range1, Range2)(Range1 target, Range2 filler)
if(isInputRange!Range2 && isOutputRange(Range1,
ElementType!Range2))
You can try to write this, but if you do, the code will never
work. ANYTHING that matches the range fill, will also match the
element based fill, since the target range supports "put" from a
range... For example:
int[2] a = [1, 2];
int[] b = new int[](8);
fill(b, a); //ambiguous call...
//Are you copying a "range" or an "element"?
//Answer: Who know! Honestly: if b is an output range of a, WHICH
IS THE RIGHT CALL?
The only way to really avoid this problem (as is currently done
in algorithm.d) is to add: "is(typeof(range.front = filler))",
but this defeats the very requirement of outputRange (outputRange
does not have front), and bumps the algorithm's requirements up
to "inputRange".
Long story short, it is not currently possible to write
templates that reliably support an outputRange restriction.
**************************************************
Problems put creates for implementation:
The big problem about put is that it makes outputRanges _lie_
about what they support. Here is an example at its simplest:
alias int[] A; alias int[] B;
A a = new int[](2);
B b = new int[](1);
assert(isOutputRange!(typeof(a), typeof(b)));
if(!b.empty)
b.put(a);
In this code:
*b is an output range of A.
*b is not empty.
*putting an A inside b creates a empty range exception !?
While this example might look "cute", it is also unacceptable
behavior. if b is a non-empty range that supports "elements A",
then it _HAS_ to be able to support a put. In this case, b
clearly does not support elements of type A.
I'm sure you can imagine the kinds of problems this can caue
for a template developer.
**************************************************
Why we don't need put(Range):
Quite simply: because we have the higher order function. The
convenience "put(Range)" is nothing more than a glorified
algorithm. Instead of using put, the user should make a call to
copy. "copy" is designed specifically for copying an input range
into an output range. Why duplicate this functionality into put?
Calling copy is much more honest and accurate about what is
actually happening.
**************************************************
Problems regarding removing put(Range):
In theory, the only problem this would create is breaking code
which can be legitimately replaced by “copy” without any
change.
It *could* also break some templates, as some ranges would
seize reporting themselves as outputRanges of Ranges, but, quite
frankly, I think those algorithms are unsafe to begin with, and
are operating on incompatible types.
**************************************************
Conclusion:
I think I've shown the problems that put(Range) creates.
Problems that are simply insurmountable. On the other hand,
"e.put(range)" brings nothing that can't be rewritten as "copy(e,
range)". For all these reasons, I think it would be best to get
rid of it.
In light of recent realizations that output ranges are really only useful with specific operations at the end of UFCS chains, such as std.algorithm.copy or std.format.formattedWrite (and arguably, the latter could be rewritten to return an input range instead), I'm wondering if we should just get rid of output ranges altogether and just have .copy be the one-stop function for implementing data sinks.
(In reply to hsteoh from comment #2)
> In light of recent realizations that output ranges are really only useful
> with specific operations at the end of UFCS chains, such as
> std.algorithm.copy or std.format.formattedWrite (and arguably, the latter
> could be rewritten to return an input range instead), I'm wondering if we
> should just get rid of output ranges altogether and just have .copy be the
> one-stop function for implementing data sinks.
The two issues with that are:
1. Copy is "Range to ouput sink". So you can't just do: "copy(1, myIntegerOutput)"
2. Copy is implemented on top of output range definition and the "put" primitive.
Unless you had something else in mind in terms of "one-stop function for implementing data sinks"? I mean (IMO), I see "copy" as pretty much the same thing as "put", but with reversed args, making it UFCS friendly.
I had honestly wondered about adding a "putInto" into phobos, which is basically just out reversed:
5.square().putInto(myOuputRange);
(In reply to hsteoh from comment #2)
> In light of recent realizations that output ranges are really only useful
> with specific operations at the end of UFCS chains, such as
> std.algorithm.copy or std.format.formattedWrite (and arguably, the latter
> could be rewritten to return an input range instead), I'm wondering if we
> should just get rid of output ranges altogether and just have .copy be the
> one-stop function for implementing data sinks.
Are there discussions on this? I have seen no discussions or evidence that there's anything fundamentally flawed with output ranges and put. There's the problem with std.range.put and UFCS, but that can be fixed by creating a different function for a range to implement for std.range.put to use rather than implementing a function call put. What issues are we talking about here that make output ranges flawed?
- Jonathan M Davis
I didn't say anything was *flawed* with output ranges. I just said that they are not *as* useful because, being sinks, any code that uses them will terminate the processing chain, making it non-composable with any subsequent data processing. IME, I've also found that code that takes output ranges as parameters tend to be limited in reusability, and tends to uglify downstream code since you have to store the data somewhere, or use hacks to allow further processing of data, etc..
In fact, all cases of output range usage that I encountered while writing range-based code can be more profitably refactored to return input ranges instead, and use std.algorithm.copy for when I actually want to use an output range. This makes the majority of the code composable, and therefore much more reusable, than if I had used output ranges everywhere. Basically, anything that looks like func(inputRange, outputRange) can be rewritten as newFunc(inputRange).copy(outputRange), where newFunc returns an input range instead of writing the results to an output range.
So, since anything that takes an output range can always be rewritten as something that returns an input followed by std.algorithm.copy, it follows that the usefulness of output ranges seems to be restricted to std.algorithm.copy. Being such, then, it seems that the logical next step is to eliminate the concept altogether, and just have std.algorithm.copy implement all the semantics of output ranges.
Hmmmm. Well, that's certainly food for thought. Output ranges are definitely for when you're writing out the final output, not for composability. However, in many of the cases where an output range would be used, copy wouldn't make sense, because the destination is really output (e.g. the console or a file) and not a new input range. So, I question that it makes sense to try and replace output ranges with copy, though I concur that in many cases where you might use an output range, it makes more sense to use an input range and then feed that into an output range if that's what you want.
Regardless, I think that it's something that merits a larger discussion in the newsgroup.
The one thing that I keep meaning to bring up with output ranges is the lack of ability to determine whether they're full (which becomes even more complicated when dealing with stuff like chars where you can't know ahead of time whether it's going to fit thanks to different encodings), but it's certainly never occurred to me that copy could be thought of as a replacement for output ranges.
In a nutshell, I have issues with an input range being able to be an output range, because there is no way to determine if it is "full".
Heck, even if we could, I resent the notion that an output range could be "full" at all!
Honestly, I wish we had never mixed the notion of (Input)Ranges with output ranges. IMO, we should have had:
-InputRange
-InputRange && hasAssignableElements (writeable output ranges)
-Sinks (like "writeln", or "container.put")
And to be honest, I've almost never seen anyone use the OutputRange interface on an input Range. The only case I know of is "copy". And I don't think it should.
(In reply to hsteoh from comment #2)
> In light of recent realizations that output ranges are really only useful
> with specific operations at the end of UFCS chains, such as
> std.algorithm.copy or std.format.formattedWrite (and arguably, the latter
> could be rewritten to return an input range instead), I'm wondering if we
> should just get rid of output ranges altogether and just have .copy be the
> one-stop function for implementing data sinks.
std.format.formattedWrite is the best thing ever.
(In reply to monarchdodra from comment #7)
> In a nutshell, I have issues with an input range being able to be an output
> range, because there is no way to determine if it is "full".
>
> Heck, even if we could, I resent the notion that an output range could be
> "full" at all!
Arrays are input ranges, and calling put on them fills them rather than appending to them, so yes, they can get full. And given that it's likely not all that uncommon to be dealing with fixed size buffers, I would expect that there would be plenty of cases where you'd want to write to something using ranges, and it could get full.
> Honestly, I wish we had never mixed the notion of (Input)Ranges with output
> ranges. IMO, we should have had:
> -InputRange
> -InputRange && hasAssignableElements (writeable output ranges)
> -Sinks (like "writeln", or "container.put")
>
> And to be honest, I've almost never seen anyone use the OutputRange
> interface on an input Range. The only case I know of is "copy". And I don't
> think it should.
I concur. While it might make sense for something to operate as both an input range and an output range (e.g. fill it as an output range and then empty it as an input range), I think that having front work as a way to write to a range rather than it having to having something specific to output ranges was a mistake.
Regardless, the simple fact that the closest thing that we have to a built-in output range (arrays) can get full and yet the output range API does not take that into account is a major indicator that output ranges need at least a minor overhaul if not a major one.
Would it make sense to extend the definition of output range to include a .full method?
monarchdodra reported this on 2012-07-31T04:22:05Z
Transfered from https://issues.dlang.org/show_bug.cgi?id=8483
CC List
Description