Open burner opened 17 years ago
We are well aware of the limitation, and are considering a number of approaches to it, some along the lines suggested by this enhancement request. It is a hard problem for multiple reasons. One of them is that the solution suggested in the enhancement request is unsound for a number of types, including arrays of arrays. This is because concatenating two arrays of arrays is not "deep" - it will create a new outer array but it will not duplicate each of the contained array. Consider:
int[][] array1 = new int[][10];
invariant(int[]) x = ([ 1, 2, 3 ]).idup;
invariant(int[])[] array2 = [ x ];
int[][] array3 = array1 ~ array2;
array3[10][0] = 5; // unsound, changes invariant array
This illustrates how code with the proposed semantics can violate immutability.
To recap today's semantics, consider:
import std.stdio;
struct S { int * p; }
void main() { test!(int); test!(int[]); test!(S); }
void test(T)() { auto m = new T[10]; auto c = new const(T)[15]; auto i = new invariant(T)[20]; auto mm = m ~ m; auto mc = m ~ c; auto mi = m ~ i; auto cc = c ~ c; auto ci = c ~ i; auto ii = i ~ i; writeln("mm: ", typeof(mm).stringof); writeln("mc: ", typeof(mc).stringof); writeln("mi: ", typeof(mi).stringof); writeln("cc: ", typeof(cc).stringof); writeln("ci: ", typeof(ci).stringof); writeln("ii: ", typeof(ii).stringof); }
This program prints:
mm: int[] mc: int[] mi: int[] cc: const(int)[] ci: int[] ii: invariant(int)[] mm: int[][] mc: const(int)[][] mi: const(int)[][] cc: const(int[])[] ci: const(int)[][] ii: invariant(int[])[] mm: S[] mc: const(S)[] mi: const(S)[] cc: const(S)[] ci: const(S)[] ii: invariant(S)[]
which is quite inconsistent (handles int, int[], and S all differently) and worthy a bug report.
One direction explored to address this problem is allowing definition of polysemous types - types that can exist only as rvalues and allow implicit conversion to a number of other types. Within that framework it should be possible to design a parameterized type called Unique!(T) that allows conversions to invariant(T) and T alike. Then a function such as listdir could return Unique!(char[]) which can be converted, to its client's desire, to either a mutable or immutable array.
Unique!(T) would remain unsound because there is no statically checkable way to initialize it properly, so it needs to be used with care. Basically the issue of typing "fresh" data reveals a hole in our type system that would be hard to fix without complicating it to an extent we considered unacceptable. To estimate the work involved for both compiler designers and users, I suggest consulting the paper archjava.fluid.cs.cmu.edu/papers/oopsla02.pdf (look for "unique).
Andrei
Thank you for looking at this issue. You are correct in your statement that my proposed solution does not solve your specific example. There could be other examples, such as if you had a struct S that contained a pointer, then concatenating an invariant(S)[] into a mutable S[] would result in the pointer violating the immutability.
However, this could be alleviated if we had the following rule:
if X is a data type, and v1 is of the type invariant(X) and v2 is of the type X, it is allowed to copy v1 to v2 provided that the X data type is not a pointer or a reference or contains any pointers or references.
I think this rule is sound and really should be an enhancement on its own, but it provides a way that my solution could work. With this rule, my example passes the rule since copying an invariant(char) to a char is allowed, but your example fails because copying an invariant(int[]) to a int[] fails.
What do you think?
The rule you mention was suggested and rejected. It is sound, but the problem with it is that the semantics of a type becomes implicitly dependent on an implementation detail (e.g. a private pointer member). We don't want to separate "types with at least a pointer in'em" from "types with no pointers in'em".
Forgive me for pushing this point, but it seems that D already has types that are inherently considered differently because they have pointers. i.e. classes:
X *x = new X;
If X is a struct or int, this works. If it is a class, it fails.
Also, how is treating value types different from pointer-containing types a problem? I don't see how it is detrimental to the language.
Thanks for hearing my opinions on this, I'm sure you've discussed this type of thing ad-nauseam.
It's great to push the point. It is true that classes are reference types and as such are distinct from structs and builtins, which are value types. But that's a decision that is taken up front by the type designer: do I want dynamic polymorphism, inheritance, infinite lifetime (garbage collection), reference semantics? Then I write:
class Foo { ... }
Conversely: do I want eager copy, value semantics, deterministic termination (that's to come in D 2.0 in a couple of weeks)? Then I write:
struct Bar { ... }
This does introduce an inconsistency in the type system, but it tends to work well in practice and is documented upfront in the definition of the type. In contrast, agreeing to further part types depending on whether they contain pointers or not is murkier.
That is not even the main problem. At the risk of spreading FUD, Real Soon Now, not all types containing pointers will be non-values. In a couple of weeks we'll be able to write (syntax details may change):
struct Value { int p = null; this() { p = new int; } ~this() { delete p; } this(scope) { p = new int(p); } }
This would effectively implement a structure containing a pointer but with 100% value semantics (the third function is automatically invoked whenever an object is copied into a new scope, right after the bitwise copy).
Note that the assignment operator will do the right thing, which is: (1) duplicate the source into a temporary by bitwise copying it and then invoking this(scope), (2) destroy the assignment target, (3) bitwise copy the temporary into the assignment target.
Andrei
I still believe the rule works.
The general rule I am extending is, for any basic type(int, char, byte, etc.), you can copy an invariant version to a mutable version without issue. i.e.:
invariant(int) x = 1; int y = x; // works (I think :)
If we imagine that any time you copy a structure or a buffer, instead of doing a memcpy (which is an implementation detail), you do a copy of each member/element.
so:
char[5] x; x[] = "hello";
is equivalent to:
foreach(i, c; "hello") x[i] = c;
Which should work because of the first rule (I can copy an invariant(char) to a char).
Now, for a struct, you can postulate the same thing:
struct S { int x; }
invariant(S) s1 = cast(invariant)S(1); S s2;
s2 = s1;
is equivalent to:
s2.x = s1.x;
Which should work because of the first rule.
However, if we think about pointers:
invariant(char) cp = "hello".ptr; char cp2; cp2 = cp;
this fails because now you have a mutable pointer to invariant data. Here is the rule I am trying to exploit. Because a type has a pointer in it:
struct S { char *cp; }
invariant(S) s1; S s2; s2 = s1;
This is equivalent to: s2.cp = s1.cp;
which fails the rule above, so the whole thing fails.
With your new syntax, let's look at how it works:
struct Value { int p = null; this() { p = new int; } ~this() { delete p; } this(scope) { p = new int(p); } }
invariant(Value) v1; Value v2;
v2 = v1; Now, this is possible. Because instead of using p, we are using *p, which is of type invariant(int). invariant(int) can be casted to int implicitly, and so the rule still works.
I'm not sure if this was intended as a use for your new syntax, but I certainly still believe that it doesn't break the rules (BTW, I like the idea).
Addressing your point about parting types if they contain pointers are not, I don't really think of it that way. What I think of it as is that an invariant pointer cannot be implicitly copied to a mutable pointer. If by copying a type, you have to copy an invariant pointer to a mutable pointer, then the compile fails. If you do not have to copy an invariant pointer to a mutable pointer, then the copy works. It's just easier to explain as "types that contain invariant pointers cannot be copied to a non-invariant version of the type". The compiler can check this without adding extra runtime information, so implementation-wise it does not affect the output. Semantic-wise, it doesn't break existing code, it just allows operations that should be valid but aren't under the current scheme. In fact, I think it's more consistent with the way builtin types work.
BTW, the same applies to const.
-Steve
That's a very interesting idea. Among other things, you show how our copy construction scheme is incomplete because it "loses" the qualifiers of the source, information that turns out to be essential.
I'll make sure I'll bring this up to Walter, and to give you proper credit if this idea makes it into the language. Thanks!
blushes :) Thanks for considering the idea!
I read all comments on this enhancement request yesterday, and now I was thinking about it again and I suddenly realized that the solution is very simple in fact. D should be extended with a new implicit type qualifier 'unique'. By 'implicit' I mean that such a qualifier should NOT appear anywhere in the syntax, just like 'mutable' does not appear anywhere in the syntax, but still exists conceptually. The unique qualifier should be automatically applied by the compiler to the result type of any expression returning a reference to newly constructed (or copied) data. Anything of a type qualified as unique could be implicitly cast to any of mutable, invariant and const.
So if the array concatenation operator is modifier to always return a unique reference, the following would all be perfectly valid:
char[] mutableString = "foo" ~ "bar"; invariant(char[]) invariantString = "foo" ~ "bar"; invariant(char)[] reassignableInvariantString = "foo" ~ "bar";
Of course a NewExpression would also return a unique reference:
struct S { int i; }
invariant(S) s = new S(4); // valid
The following is of course wrong in my previous comment:
invariant(S) s = new S(4); // valid
It should be: invariant(S)* s = new S(4); // valid
On 26/03/2008, d-bugmail@puremagic.com d-bugmail@puremagic.com wrote:
Of course a NewExpression would also return a unique reference:
That doesn't follow. Counterexample follows:
char[1000] globalBuffer;
class S
{
char[] buffer;
this(int n)
{
buffer = globalBuffer[0..n];
}
}
invariant(S) s = new S(4); // not valid
This is harder than you think! :-)
Yes, you are right here... The rule doesn't apply to NewExpressions if they produce a class instance that may contain a pointer to mutable data. So best would be to not apply it to class instances at all (to keep the rule simple).
But it still holds for concatenating arrays of value types and for NewExpressions that produce a struct instance.
(In reply to comment #2)
However, this could be alleviated if we had the following rule: if X is a data type, and v1 is of the type invariant(X) and v2 is of the type X, it is allowed to copy v1 to v2 provided that the X data type is not a pointer or a reference or contains any pointers or references. I think this rule is sound and really should be an enhancement on its own, but it provides a way that my solution could work. With this rule, my example passes the rule since copying an invariant(char) to a char is allowed, but your example fails because copying an invariant(int[]) to a int[] fails. What do you think?
"copying an invariant(int[]) to a int[]" fails indeed, but copying invariant(int[]) to an invariant(int)[] would work as well. So basicly what we are looking into here is tailconst. The concatenation of a const(T)[] with another const(T)[] conceptually returns a type that is:
tailconst(T)[]
and similarly for invariant/tailinvariant. This is because the "head-value" of T is copied, so it can be mutable. So these semantics are safe:
int[] ~ int[] --> returns int[] const(int)[] ~ const(int)[] --> returns int[] int[][] ~ int[][] --> returns int[][] const(int[])[] ~ const(int[])[] --> returns const(int)[][] const(int)[][] ~ const(int)[][] --> returns const(int)[][]
How about this though? :
const(Foo)[] ~ const(Foo)[] --> returns ???
Unfortunately there is no way in D to express tailconst for classes (and structs), so the last example would have to use normal const(T) instead of tailconst(T). I'm not sure what the implications of this special, and inconsistent case are tough, it could break generic code or something. Which does not mean these semantics would not be worthwhile.
(In reply to comment #13)
"copying an invariant(int[]) to a int[]" fails indeed, but copying invariant(int[]) to an invariant(int)[] would work as well. So basicly what we are looking into here is tailconst. The concatenation of a const(T)[] with another const(T)[] conceptually returns a type that is:
tailconst(T)[]
Not exactly. The enhancement request has kind of mutated, due to the rule that I stipulated (which I think was subsequently added to D 2).
What I think it should be is
const(T)[] ~ const(T)[] should return a type that is T[] if const(T) implicitly casts to T under the rule. If T does not implicitly cast, it should return const(T)[].
The whole idea behind this is that ~ is always generating new data. Why should the result always be based on the input types? At the very least, The pieces of the new array that are unique should be unique, and implicitly cast to mutable.
This problem is not an easy one to solve, I realize, but it is solvable.
and similarly for invariant/tailinvariant. This is because the "head-value" of T is copied, so it can be mutable. So these semantics are safe:
int[] ~ int[] --> returns int[] const(int)[] ~ const(int)[] --> returns int[] int[][] ~ int[][] --> returns int[][] const(int[])[] ~ const(int[])[] --> returns const(int)[][] const(int)[][] ~ const(int)[][] --> returns const(int)[][]
How about this though? :
const(Foo)[] ~ const(Foo)[] --> returns ???
If Foo is a struct with no pointers, then it returns Foo[].
If Foo is a class, a pointer, or a struct with pointers, then it returns const(Foo)[].
Unfortunately there is no way in D to express tailconst for classes (and structs), so the last example would have to use normal const(T) instead of tailconst(T). I'm not sure what the implications of this special, and inconsistent case are tough, it could break generic code or something. Which does not mean these semantics would not be worthwhile.
Yes these cases would have to be handled delicately. There are other things to consider.
For example, if Foo is an alias to int , then if you returned a tailconst version of Foo what is that? It would really be const(int), but how is that expressed in terms of Foo? What if Foo is a typedef to int *?
There are really two problems that need to be solved to allow this enhancement. The first I believe is already solved, and that is the ability to implicitly cast to/from const/invariant/mutable if the type contains no pointers. The second is to allow the return type of a function to depend on how it is used and/or called. I'm not talking about overloading functions based on return type, because that isn't enough in itself. Why should I have to write multiple functions that act the same way, just so the return type is different? I keep thinking that a 'unique' concept needs to be added before this whole solution is possible.
Anyways, this issue really doesn't look like it will be solved in the near future, without good answers for all the sub-issues involved.
(In reply to comment #14)
(In reply to comment #13)
"copying an invariant(int[]) to a int[]" fails indeed, but copying invariant(int[]) to an invariant(int)[] would work as well. So basicly what we are looking into here is tailconst. The concatenation of a const(T)[] with another const(T)[] conceptually returns a type that is:
tailconst(T)[] Not exactly. The enhancement request has kind of mutated, due to the rule that I stipulated (which I think was subsequently added to D 2). What I think it should be is const(T)[] ~ const(T)[] should return a type that is T[] if const(T) implicitly casts to T under the rule. If T does not implicitly cast, it should return const(T)[].
I understand your rule, but it is not the most permissive safe-behavior. What I was describing was the most permissive safe-behavior possible.
For example, with your rule: const(int[])[] ~ const(int[])[] would return const(int[])[] . But it is safe to return const(int)[][] , which is more permissive.
Yes, you are correct. The second form is more permissive. I didn't see that originally.
Concatenating arrays of arrays and arrays of pointers are special cases because a tail-const version exists.
I think as long as the implicit casting works, so:
const(int[])[] a; const(int)[][] b = a ~ a; const(int[])[] c = a ~ a;
all works, then it shouldn't be a problem. It probably will not break generic code because if you have a generic function:
f(T)(const(T)[] input) {...}
How does one construct a 'tailconst' verison of T given just T? I think that was what I was saying earlier (about a typedef). You could do an is-expression to get at the underlying type, but then that is specialized for arrays, and it is no more convoluted than the current state of affairs.
If you know T is always going to be an array, you can do:
f(T)(const(T[])[] input) {...}
and you can universally deal with all types of arrays.
So now, I think the rules are:
If concatenating two arrays together, of type T[]:
If T is of the form const(U[]), then one can assign the result to a type of const(U)[][]
If T is of the form const(U), then one can assign the result to a type of const(U)[]
If T is of the form const(U), and const(U) implicitly casts to U, then one can assign the result to a type of U[].
For the above rules, if T is invariant(U) the same style rules apply.
In all cases, one can assign to the result type of T[].
If concatenating two arrays together where the element type of the array varies only in constancy, and const(T) and invariant(T) implicitly cast to/from T, then the result can be assigned to invariant(T)[], const(T)[] or T[].
Did I miss anything? Boy this is getting more complex to explain :)
Issue 9251 has been marked as a duplicate of this issue.
Issue 7311 has been marked as a duplicate of this issue.
Wow this is one old one. Whatever the solution to this is, it shouldn't make operator ~ magic, i.e. the behavior should be implementable as a regular function call.
(In reply to comment #19)
Whatever the solution to this is, it shouldn't make operator ~ magic, i.e. the behavior should be implementable as a regular function call.
Are you willing to explain why?
(In reply to comment #20)
(In reply to comment #19)
Whatever the solution to this is, it shouldn't make operator ~ magic, i.e. the behavior should be implementable as a regular function call.
Are you willing to explain why?
It's obvious - magic is bad etc.
(In reply to comment #21)
It's obvious - magic is bad etc.
But array concat is a built-in op. So why is some magic bad here?
(In reply to comment #22)
(In reply to comment #21)
It's obvious - magic is bad etc.
But array concat is a built-in op. So why is some magic bad here?
Because user-defined types would want to define it to behave similar as for arrays. Also, http://goo.gl/AZLXo
(In reply to comment #23)
Because user-defined types would want to define it to behave similar as for arrays.
In this case I think a little amount of magic is acceptable (if it's necessary).
(In reply to comment #24)
In this case I think a little amount of magic is acceptable (if it's necessary).
because in my opinion the gain from accepting code like this:
void main() { string s1 = "abc"; char[] s2 = s1 ~ s1; }
outweighs the disadvantages caused by an eventual small amount of magic.
(In reply to comment #24)
(In reply to comment #23)
Because user-defined types would want to define it to behave similar as for arrays.
In this case I think a little amount of magic is acceptable (if it's necessary).
There is no reason to make it necessary.
I think this is doable enhancement with sane rule.
From 2.061, we have a 'unique expression' in certain cases.
pure int[] newArr(int n) { return new int; } immutable int[] arr = newArr(3); // newArr(3) creates an unique array, so it is implicitly convertible to // immutable
class MyClass {} immutable MyClass c = new MyClass(); // Raw NewExpression creates an unique object, so it is implicitly convertible // to immutable.
From 2.063, qualified constructor supports creating 'unique object'.
struct MyObj { this(int) pure {} } immutable MyObj *p = new MyObj(1); // this(int) pure makes 'unique object', so the created object is implicitly // convertible to immutable.
Based on the 'unique expression' concept, we can consider that an array concatenation creates 'unique expression' when some conditions are satisfied.
An example of code we like to compile:
void main() { immutable data1 = [10, 20]; immutable data2 = data1 ~ [30]; }
This is a really old one. Going to close, since I think we have ways around this (i.e. pure functions returning unique data).
I don't understand why you wanted to close this issue. Anyway, there are two duplicates, so I'll reopen it.
I expect at some point for concatenation to be a fully lowered template function, and at that point, we have all the tools to do this.
You can leave it open if you want, I was just looking through issues I had reported, and this one just struck me as being very stale, not just in time but in light of all the developments that have happened in D since 2007.
This code:
char *toStringz2(const(char)[] s) { return (s ~ "\0").ptr; }
compiles successfully today, so I am going to mark this one as WORKSFORME.
It works for string literals, but not for two variables:
unittest { import std.meta : AliasSeq; static foreach (T; AliasSeq!(char, const(char), immutable(char))) {{ pragma(msg, T.stringof, ":"); T[] s = "".dup;
// Result type of concatenation may differ:
pragma(msg, "Variable: ", typeof(s~s).stringof);
pragma(msg, "Literal: ", typeof(s~"").stringof);
// Two const(char)[]s:
{
char* p1 = (s~s).ptr;
immutable(char)* p2 = (s~s).ptr;
const(char)* p3 = (s~s).ptr;
char[] a = s~s;
string b = s~s;
const(char)[] c = s~s;
}
// One literal:
{
char* p1 = (s~"").ptr;
immutable(char)* p2 = (s~"").ptr;
const(char)* p3 = (s~"").ptr;
char[] a = s~"";
string b = s~"";
const(char)[] c = s~"";
}
}}
}
So no, this is not fixed. When it's fixed, all of the above should compile without issue.
(In reply to RazvanN from comment #32)
This code:
char *toStringz2(const(char)[] s) { return (s ~ "\0").ptr; }
compiles successfully today, so I am going to mark this one as WORKSFORME.
This is nice, is this a recent change? In any case, I think it would be nice if we simply consider the result equivalent to a pure function:
T[] concat(T)(const(T[]) arr1, const(T[]) arr2) pure
for T that have no indirections. This should solve the problem. But I can't say that it's necessary much any more, we have so many new tools since 2007/2008.
(In reply to Simen Kjaeraas from comment #33)
So no, this is not fixed. When it's fixed, all of the above should compile without issue.
I tried out a simple function prototyped as above, it works, except for the automatic casting of the .ptr property: https://run.dlang.io/is/Bu3k1G
And I actually think it's fine for the .ptr inference to fail. Simply because by the time we get to the .ptr property, the compiler has to have decided what type the array return is, and it defaults to mutable.
But in any case, we have some "new" information, especially around the pure/unique situation which shows this is reasonable and possible. I don't know still if it's worth pursuing this enhancement request. The only thing I would say would be nicer is if we could somehow specify a unique array vs. just a mutable one. Because things like:
auto x = str ~ "";
causing x to be char[] and not string could cause bad problems. It would be nice if the result defaulted to something based on the arguments but allowed implicit casting if necessary because of the uniqueness.
Issue 12402 has been marked as a duplicate of this issue.
@nordlow updated dlang/dmd pull request #12341 "Fix 1654 - Array concatenation result should be implicitly castable between mutable and immutable if the elements support it." fixing this issue:
schveiguy (schveiguy) reported this on 2007-11-09T13:45:37Z
Transfered from https://issues.dlang.org/show_bug.cgi?id=1654
CC List
Description
One of the great aspects of D is the array concatenation operator. However, the current compiler sets the type of the result according to the operands. This prevents using invariant or const arrays to build a mutable array without doing extra work. For example, the toStringz function:
char *toStringz(const(char)[] s) { copy = new char[s.length + 1]; copy[0..s.length] = s; copy[s.length] = 0; return copy.ptr; }
This could easily be written as: return (s ~ "\0").ptr;
Except that the result of the concatenation is a const(char)[].
But why does it need to be const? I'm guessing that the reason is because when dealing with functions that take invariant strings, it would be ugly to always have to cast to invariant when doing concatenation. And of course, functions that use invariant strings can be optimized differently than mutable or even const strings.
The issue I see is that the true result IS mutable, because it is generated from the heap. It's artificially cast to invariant to keep the type the same.
So here is a proposal that would allow efficiency, and utilization of the fact that concatenation always results in newly allocated data:
First, there should be two array concatenation operator internal functions. One that takes two invariant arrays and one that takes two const arrays. I'll call them icat and ccat respectively. Both will return mutable arrays. The icat function can be pure when pure functions are supported.
When the compiler encounters an array concatenation in code, if at least one argument is not invariant, then ccat will be used. If both arguments are invariant, then icat will be used.
Regardless of the method used, if the resulting rvalue is expected to be invariant, then an implicit invariant cast is allowed. This means that the following code does not need invariant casts:
char[] blah = "hello".dup; string blah2 = blah ~ "world";
If the result is assigned to an lvalue that is either const or mutable, then no cast is needed (mutable array is implicitly castable to const).
This would allow us to use the full potential of array concatenation without having to worry about casting away invariant or writing several lines of code to get around this problem.
I'll further point out that the workaround for the current problem does not allow efficient concatenation. For example:
const(char)[] a1, a2;
// method 1, just dup it. // the problem is that we make a needless temporary copy of the data char[] cat1 = (a1 ~ a2).dup;
// method 2, build a temporary array // the problem is that the initialization of the array is needless. char[] cat2 = new char[a1.length + a2.length]; // needless init of memory cat2[0..a1.length] = a1; cat2[a1.length..$] = a2;
In addition, idup is not needed for creating an invariant array out of the concatenation of two mutable or const arrays.