daokoder / dao

Dao Programming Language
http://daoscript.org
Other
199 stars 19 forks source link

Maps should take into account the `==` operator if available instead of pointer comparison #308

Closed dumblob closed 9 years ago

dumblob commented 9 years ago
load time import time
m=(map<time.DateTime, any>){->}
x=time.make(2014, 1, 1)
y=time.make(2014, 1, 1)
m[x] = 5
if (y in m)
 io.writeln(true)
else
 io.writeln(false)
false
= none
load time import time
m=(map<int, any>){->}
x=time.make(2014, 1, 1)
y=time.make(2014, 1, 1)
m[x.value] = 5
if (y.value in m)
  io.writeln(true)
else
 io.writeln(false)
true
= none

Also I'm not sure about blind pointer comparison for all non-primitive data. I'd probably disable the default fallback to pointer comparison for data which Dao doesn't know anything about (i.e. those, which don't have any routine for the == operator defined).

Night-walker commented 9 years ago

== operator is not sufficient for that, as it doesn't tell how to hash the given value or compare it for being lesser/greater then other values. Given the dual nature of Dao maps (somehow, dualism is a typical case for Dao), it is problematic to define requirements for a class/type to be compatible with map.

Comparison with < and <= in most cases don't make real sense (io::Stream? fs::Entry? xml::Element?). But if you don't define them, then you won't be able to use your class with any map (since it's impossible to distinguish hashes and trees by the type). And you yourself cannot implement comparison and hashing in terms of DaoValue pointers in pure Dao.

So disabling "fallback to pointer comparison" would essentially deny the use of almost all non-built-in types/classes with maps. It doesn't look like a bright prospect.

If you really want to store certain values instead of references, you better just serialize class instances into numeric or string data, just as you did in your second example. It should prove to be a far less troublesome approach.

dumblob commented 9 years ago

Yes, you're right, but roughly said, the first example (without serialization) should fail or warn the user about it in cases, where the output won't be what the user expects. Because I have no idea how to detect it, I just proposed disabling the fallback for such types while being aware of the edge cases in which one needs to use some weird data (i.e. those where < and <= doesn't make sense) for map keys.

dumblob commented 9 years ago

The current behavior is very extremely error prone (yes, I admit that this took me almost an hour to figure it out in a code shuffling with such map in different places).

Night-walker commented 9 years ago

Yes, you're right, but roughly said, the first example (without serialization) should fail or warn the user about it in cases, where the output won't be what the user expects.

Personally, I find the way it currently works very simple and clear. We're not on Java/.NET, after all.

I just proposed disabling the fallback for such types while being aware of the edge cases in which one needs to use some weird data

It's not an edge case. There is plenty of use cases for storing non-comparable objects (and tuples with these objects, and lists, and etc.) in maps. It is highly impractical to discard all that. The approach from languages with (more or less) unified type systems is arguably ill-suited here.

The current behavior is very extremely error prone (yes, I admit that this took me almost an hour to figure it out in a code shuffling with such map in different places).

I don't think so, not for Dao at least. A counter-example: a map reports that it contains value you never actually put there because of overloaded comparison implementation. Even better, what to do if inner data of the object changes? It should then imply a different place in a hash or tree! How do you propose to solve that?

dumblob commented 9 years ago

It's not an edge case. There is plenty of use cases for storing non-comparable objects (and tuples with these objects, and lists, and etc.) in maps. It is highly impractical to discard all that. The approach from languages with (more or less) unified type systems is arguably ill-suited here.

Are you really sure, that you'd like to store tuples with these objects, and lists, and etc. as keys in a map? Well, why not, but then requiring that these keys are invar makes perfect sense to me as the bunch of data represents a unique combination.

I don't think so, not for Dao at least. A counter-example: a map reports that it contains value you never actually put there because of overloaded comparison implementation. Even better, what to do if inner data of the object changes? It should then imply a different place in a hash or tree! How do you propose to solve that?

We have the magic invar, don't we? It could be forced for those as mentioned above.

Btw, the first example above was initially

load time import time
m={->}
x=time.make(2014, 1, 1)
y=time.make(2014, 1, 1)
m[x] = 5
if (y in m)
 io.writeln(true)
else
 io.writeln(false)

but because of the recent bug with any in map keys, I've added the casting (map<time.DateTime, any>) which makes it explicit and let's say at least a bit clearer. I'm pretty sure though, that this piece of code should somehow be forbidden or work out-of-box like expected. I'd be perfectly fine with writing:

load time import time
m={->}
invar x=time.make(2014, 1, 1)
invar y=time.make(2014, 1, 1)
m[x] = 5
if (y in m)
 io.writeln(true)
else
 io.writeln(false)
Night-walker commented 9 years ago

Are you really sure, that you'd like to store tuples with these objects, and lists, and etc. as keys in a map?

Why not? I don't want to care about what can be put in a map, and what cannot. I may want to associate some data with some objects, and map is a natural choice regardless of what those objects are.

Well, why not, but then requiring that these keys are invar makes perfect sense to me as the bunch of data represents a unique combination.

It simply cannot be assured. There is no way to request an object which cannot be changed in any other context, no invar will do that.

We have the magic invar, don't we? It could be forced for those as mentioned above.

As I said, it can't. It just guarantees local immutability.

Also note that (as you like things running quickly :) ) using Dao-level functions for comparison will result in significant overhead during any operation on maps. Instead of calling C function which does switch and e.g. return x.data < y.data? -1 : (x.data > y.data? 1 : 0) there will be practically normal Dao routine call with lots of related manipulations plus execution of the code in the routine body. And generally this will happen multiple times, depending on map structure.

dumblob commented 9 years ago

As I said, it can't. It just guarantees local immutability.

Also note that (as you like things running quickly :) ) using Dao-level functions for comparison will result in significant overhead during any operation on maps. Instead of calling C function which does switch and e.g. return x.data < y.data? -1 : (x.data > y.data? 1 : 0) there will be practically normal Dao routine call with lots of related manipulations plus execution of the code in the routine body. And generally this will happen multiple times, depending on map structure.

I was fully aware of all this when I proposed it. I never ever want to experience code like

load time import time
m={->}
x=something_returning_some_type(...)  # might be also a variant type
y=something_returning_some_type(...)
m[x] = 5
if (y in m)
 io.writeln(true)
else
 io.writeln(false)

to return false. It's absolutely opaque as you don't know which branch will be chosen without looking at the return type of something_returning_some_type() which might also be any. It's just a nonse that if (m[y]) ... succeeds without throwing any exception, but if (y in m) ... will evaluate to false (note that these two are very often used together). I want always be absolutely sure how the in operator will behave the same even without a deep knowledge about the type of the left operand. And I'm talking about deep knowledge as the type might be variant, might be a renamed/aliased primitive type etc.

Any ideas how to prevent this discrepancy?

Night-walker commented 9 years ago

I was fully aware of all this when I proposed it. I never ever want to experience code like

load time import time m={->} x=something_returning_some_type(...) # might be also a variant type y=something_returning_some_type(...) m[x] = 5 if (y in m) io.writeln(true) else io.writeln(false) to return false.

It can't possibly be achieved universally. If you construct two values in the same way, it doesn't mean they are equal. You can't always rely on it even for time.make() (what if there is os.setenv('TZ=...')). Moreover, if objects are put in a map using some potentially non-unique field of those objects, you can't really store objects -- you store their unique field values. It may be just as confusing and error-prone.

It's just a nonse that if (m[y]) ... succeeds without throwing any exception, but if (y in m) ... will evaluate to false (note that these two are very often used together)

Not sure what you mean.

(dao) f = io::Stream()
= Stream[02437CD8]
(dao) m = {->}
= { -> }
(dao) f in m
= false
(dao) m[f]
[[Error::Key]] --- Invalid key:

In code snippet:
      1 :  GETVG       :     0 ,     1 ,     1 ;     1;   f
>>    2 :  GETI        :     0 ,     1 ,     2 ;     1;   m[f]
      3 :  RETURN      :     2 ,     1 ,     0 ;     1;   m[f]
Raised by:  __main__(), at instruction 2 in line 1 in file "interactive codes";

Everything's fine as far as I can see.

I want always be absolutely sure how the in operator will behave the same even without a deep knowledge about the type of the left operand.

With the current approach, everything is as plain and explicit as it can possibly be. For map operations, any data is treated by its value. For an object, its value is, naturally, the object itself. Not some hidden data within it, and not something returned by the method which you cannot check.

dumblob commented 9 years ago

Not sure what you mean.

I'm right now away from my computer, but IIRC, it was something with any:

f = io::Stream()
f2 = io::Stream()
m = {->}
f in m
f2 in m
m[f] = 5
f in m
f2 in m
m[f]
m[f2]

Anyway, the whole problem is exactly what you've described - how to distinguish by syntax, that we're dealing with object (i.e. pointer) or with a primitive type in the map key. It's similar to passing data to routines which is explicit/obvious (because you can't simply work with the reference directly in Dao, you have to use some object interface like methods or overloaded operators etc.), but with map keys it's hidden :(

Night-walker commented 9 years ago

I don't see any problem. There is no real necessity to distinguish primitive and non-primitive data here, as it is handled in simple and uniform way. There is no hidden values or methods which act behind the scene when interacting with a map, everything's clear and predictable.

You just have to keep in mind that DateTime, BigInt, etc. are not scalar values. That won't change simply because of some ad-hoc handling for maps etc. The only good option I see is value classes which mimic primitive (scalar) values. We already touched this topic, and such feature was deemed not worth the efforts required to provide it.

dumblob commented 9 years ago

We already touched this topic, and such feature was deemed not worth the efforts required to provide it.

Hm, I had to miss this discussion or I simply forgot it :( . Can you please point me there? I was rather thinking that this is actually quite similar problem to what we were discussing in https://github.com/daokoder/dao/issues/263 about implicit/explicit references. Either way, it seems we should at least unite somehow an interface for serialization of non-primitive data (e.g. by writing it to documentation).

Btw the problem with any is really there (it's the one from https://github.com/daokoder/dao/issues/306):

(dao) f = io::Stream()
= Stream[0x203bb40]
(dao) f2 = io::Stream()
= Stream[0x2188e40]
(dao) m = {->}
= { -> }
(dao) f in m
= false
(dao) f2 in m
= false
(dao) m[f] = 5
= 5
(dao) f in m
= false                            # nonsense!
(dao) f2 in m
= false
(dao) m[f]
= 5                                # a proof that (f in m) returning false is a nonsense
(dao) m[f2]
[[Error::Key]] --- Invalid key:

In code snippet:
      1 :  GETVG       :     0 ,     3 ,     1 ;     1;   f2
>>    2 :  GETI        :     0 ,     1 ,     2 ;     1;   m[f2]
      3 :  RETURN      :     2 ,     1 ,     0 ;     1;   m[f2]
Raised by:  __main__(), at instruction 2 in line 1 in file "interactive codes";
Night-walker commented 9 years ago

From here.

I have considered this before (I even left some comments regarding this in source long time ago). But this is probably not a good idea for Dao instances. For C data types, it may be OK to support this, as you mentioned some of them are essentially scalars. I actually have been considering this for bigint. But there may be too many places that require changing, probably better not to do it (for now at least).

Btw the problem with any is really there

That's just a bug.

dumblob commented 9 years ago

From here.

Thank you. It seems, it's still an open question for the future. At least this issue I raised proves that the current state will be painful.

Night-walker commented 9 years ago

Well, I wouldn't call this a dire issue: when an object can be uniquely identified by a scalar value, you may (but don't have to) use that value instead of the object itself as a key.

I doubt using overloaded operators is a good idea here anyway. I suppose only a noticeable conceptual shift from reference-based objects to value-based ones could provide ground for the behavior you want. But that's an extra layer of complexity, so I'm not sure it's a good idea either.

dumblob commented 9 years ago

Considering that there'll be quite a large amount of wrappers and scalar-like objects (C data types), it doesn't sound that futile to me.

About the extra layer of complexity, it doesn't look that bad. Of course the devil is in the detail, but in general the problem is not that much about implementation, but rather about all the particular decisions where a reference should be used and where a value.

dumblob commented 9 years ago

I woke up today and was thinking about adding something like scalar<> wrapping type which would enforce the needed interface on objects when used. The usage could then look like:

i = BigInt(5)
m1: map<@K, @V> = {->}
m2: map<scalar<@K>, @V> = {->}
m1[i] = 'abc'
m2[i] = 'def'
i += 4
io.writeln(m1[i])  # { BigInt<0x...> -> "abc" }
io.writeln(m2[i])  # error, because the key is missing

This wouldn't change the simplicity and semantics of the current approach each object is treated as pointer, would retain a seamless-treatment for existing scalar types (as those would be compatible with scalar<> out-of-box), but allow a very simple and transparent compile-time check for the value-like behavior. Also the implementation should be quite straightforward as we already have similar "wrap" types.

Night-walker commented 9 years ago

The thing is, it does not resolve this situation by itself. If e.g. BigInt does not define specific "map interface", this scalar will be of no use other then raising compile-time error. And if BigInt does provide some special identification/comparison means, scalar should simply be redundant, only making it all more complex and variational.

If any change is to take place (of which I am not certain), I think it should be on the side of the class/type in order to ensure simple and intuitive behavior in all cases.

dumblob commented 9 years ago

If e.g. BigInt does not define specific "map interface", this scalar will be of no use other then raising compile-time error.

That's the goal. Btw I wouldn't call it "map interface" as it will have more use cases not less important than map (serialization of such scalar-like objects is very common - in case of scalar<>, it would look like (scalar<@T>)my_bigint_number).

And if BigInt does provide some special identification/comparison means, scalar should simply be redundant, only making it all more complex and variational.

Why redundant? We want to work with pointers (as it's simple and fast), but in certain cases we want both - pointer and scalar-like handling (depending on the situation which is not known at the time the class/type is defined).

daokoder commented 9 years ago

I don't think so, not for Dao at least. A counter-example: a map reports that it contains value you never actually put there because of overloaded comparison implementation. Even better, what to do if inner data of the object changes? It should then imply a different place in a hash or tree! How do you propose to solve that?

Right, this is the real issue of supporting user defined comparison for map keys. As @Night-walker also pointed out, invar cannot solve the problem here. I agree with @Night-walker that, pointer comparison is the only correct way to do for map keys, unless the key objects are truly immutable.

I was fully aware of all this when I proposed it. I never ever want to experience code like

load time import time
m={->}
x=something_returning_some_type(...)  # might be also a variant type
y=something_returning_some_type(...)
m[x] = 5
if (y in m)
 io.writeln(true)
else
 io.writeln(false)

to return false. It's absolutely opaque as you don't know which branch will be chosen without looking at the return type of something_returning_some_type() which might also be any. It's just a nonse that if (m[y]) ... succeeds without throwing any exception, but if (y in m) ... will evaluate to false (note that these two are very often used together). I want always be absolutely sure how the in operator will behave the same even without a deep knowledge about the type of the left operand. And I'm talking about deep knowledge as the type might be variant, might be a renamed/aliased primitive type etc.

Any ideas how to prevent this discrepancy?

The only solution I can think of is to make the type that you want to behavior as you described immutable, and make its object unique with respect to its data. So for example, for DateTime, the type could be implemented such that, each DateTime object will corresponds to a unique time value. So,

x=time.make(2014, 1, 1)
y=time.make(2014, 1, 1)

will always return the same object. This way DateTime can be compared as pointers in map keys. If the type provides no method for users to modify its objects, user defined comparisons could also be supported. For user defined scalar-like C data types such BigInt and DateTime, such comparison can be naturally implemented as C functions for efficiency.

It should be pointed out that, there is no way to do it similarly for Dao class types, as they cannot be made truly immutable (again invar cannot fully guarantee this). But I don't see any issue for not supporting it for Dao class types.

Night-walker commented 9 years ago

The only solution I can think of is to make the type that you want to behavior as you described immutable, and make its object unique with respect to its data. So for example, for DateTime, the type could be implemented such that, each DateTime object will corresponds to a unique time value.

That's a nice and simple solution, albeit it should still be implemented on the user's side, as always keeping a hash of possibly unlimited size behind the scene is probably unreasonable.

If the type provides no method for users to modify its objects, user defined comparisons could also be supported. For user defined scalar-like C data types such BigInt and DateTime, such comparison can be naturally implemented as C functions for efficiency.

If objects are treated similar to scalar values when used in maps, it may make sense to extend this behavior onto other cases like assignment/passing to routine. Otherwise there will be an inconsistency. I think it is simpler to reason about the behavior of different data when you can draw a strict line between scalar-like values and reference-based objects. If not, it seems better to leave things simple.

daokoder commented 9 years ago

it should still be implemented on the user's side

Yes, this was what I had in mind.

If objects are treated similar to scalar values when used in maps, it may make sense to extend this behavior onto other cases like assignment/passing to routine.

This is not an issue, because the object/pointer and the data is one-to-one related, so there needs no data copying or any special treatment other than pointer assignment/copying.

dumblob commented 9 years ago

I think it is simpler to reason about the behavior of different data when you can draw a strict line between scalar-like values and reference-based objects. If not, it seems better to leave things simple.

I definitely agree, but those cases near this strict line should be covered by some mechanism like the proposed scalar<>. This way we would avoid changes in assignment/passing to routines etc. How it will be implemented is another issue (btw the idea of a hash table is a nice one and should scale pretty well).

Night-walker commented 9 years ago

One way or another, I am against ad-hoc mechanisms which break the conceptual meaning of data one operates on. That is, when behavior differs (conceptually) depending on the context.

If e.g. DateTime is treated as an (opaque) object when doing assignment/passing, and as a scalar time_t value when referring to a map<DateTime, ...>, that's inconsistent, confusing and error-prone. DateTime should always be treated either as an object or as a value, so that you can safely and easily abstract away from its technical side.

I definitely agree, but those cases near this strict line should be covered by some mechanism like the proposed scalar<>.

There should not be any edge cases, exceptions or magical transmutation wands like scalar<>. Either a type represents a scalar value, or it is an opaque object. That, I believe, is the only way of not making a mess of all this.

dumblob commented 9 years ago

There should not be any edge cases, exceptions or magical transmutation wands like scalar<>.

Why magical transmutation wands? It's explicit in all cases I can think of and doesn't mess anything up. It's like specifying an interface ScalarInterface (whose methods are private/accessible_only_using scalar<>) instead of scalar<MyScalarLikeType>.

Either a type represents a scalar value, or it is an opaque object.

That would make sense if we had a simple mechanism how to define both scalar and non-scalar types. Currently we can define only classes or compound types whereas both are always non-scalar (which is a sane default choice).

Night-walker commented 9 years ago

Why magical transmutation wands? It's explicit in all cases I can think of and doesn't mess anything up. It's like specifying an interface ScalarInterface (whose methods are private/accessible_only_using scalar<>) instead of scalar.

It's magical because it de-facto turns an object into scalar value in certain local context. I consider this to be too hackish.

That would make sense if we had a simple mechanism how to define both scalar and non-scalar types. Currently we can define only classes or compound types whereas both are always non-scalar (which is a sane default choice).

If you want 1:1 correspondence of object reference and underlying value (like in the example with DateTime), just use a custom constructor routine providing you with flyweight/unique objects by using a hash. At least it's more clear and predictable then ad-hoc hacks which make the whole meaning of object (class instance) vague and its behavior unclear.

dumblob commented 9 years ago

just use a custom constructor routine

That means also custom type which is not exactly what would one expect to do with scalar-like objects (especially those provided in official dao-modules).

daokoder commented 9 years ago

Supporting things like scalar<> would pull in several other things/issues about complications and overheads. For instance, there will be need for supporting customized copying, and such copying may need to be invoked every time an object is move or assigned to a variable with type scalar<x>. The overhead associated with this would be unpredictable. And the use of scalar<x> on types that do not support clean copying (or copying not done right) would have unpredictable consequences.

I think the approach I proposed is more clean and predictable. Also it should have made scalar<> redundant. It may be even unnecessary to create 1:1 correspondence between objects and values/data, if customized comparison is supported. In other words, creating fully immutable types with customized comparisons seem to be the right solution for this.

dumblob commented 9 years ago

In other words, creating fully immutable types with customized comparisons seem to be the right solution for this.

Yes, but I'm scared about classes. The approach you outlined in https://github.com/daokoder/dao/issues/308#issuecomment-61849427 assumes that one will never need to create a scalar-like object in Dao, but only in C. I agree, that it's the simpliest, least-problematic and fastest solution, but I'm not sure if it's sufficient (considering the huge amount of classes I've seen in Java with implemented method .equal()). I only hope, that such scalar-like objects will be needed in most cases only as bindings or wrappers over existing libraries and thus there won't be an need to define them in Dao.

dumblob commented 9 years ago

It might not be indeed a big issue in Dao, but for the new system programming language there will be needed a more comprehensive solution as construction of scalar-like objects will need to be done directly in the language.

Night-walker commented 9 years ago

Yes, but I'm scared about classes. The approach you outlined in #308 (comment) assumes that one will never need to create a scalar-like object in Dao, but only in C. I agree, that it's the simpliest, least-problematic and fastest solution, but I'm not sure if it's sufficient (considering the huge amount of classes I've seen in Java with implemented method .equal()). I only hope, that such scalar-like objects will be needed in most cases only as bindings or wrappers over existing libraries and thus there won't be an need to define them in Dao.

If there is need to create scalar-like objects in C, it exists for Dao as well. Simply because implementing everything in C is impractical. Dao should better provide sufficient and robust means for expanding its domain, which assures it can easily be extended by its users without knowing C and DaoVM API. It would be good to avoid further divergence of wrapped types and Dao classes.

daokoder commented 9 years ago

I have been considering a new approach. The idea is to associate a signature object/value with an object of user define types, such signature object/value can be a number, string, tuple, list or any type Dao knows how to copy, compare and hash. The association will be created when the object is used as map key (and in other situation as well, sorting for example). Such signature object is not accessible by users, so it cannot be changed. The original object can be modified (not recommended though), but such modification will not mess with the map tree. Modification on the original object can be checked if necessary, for example, after you find a key object from a map, a new signature object can be created and compared with the previous one, if it is different, an error can be raised.

Here is an example to demonstrate what I mean:

class DateTime
{
    var time = 0
    routine signature() {   # Return a signature value or object;
        return time
    }
}

var table: map<DateTime,int> = {->}
var dt1 = DateTime.{ 123 }
var dt2 = DateTime.{ 456 }

# Call DateTime.signature() and use the signature to hash and compare:
table[ dt1 ] = 1
table[ dt2 ] = 1

io.writeln( table.find( DateTime.{123} ) != none )  # true

dt1.time += 100
io.writeln( table.find( DateTime.{123} ) != none )  # error, key object has been changed!

One advantage of this approach over the previous one is that, signature() will only be called a few times, so it should introduce much less overhead. And it should be cleaner to implement as well.

Night-walker commented 9 years ago

This is exactly what I have been considering for some time :)

I thought we could actually use an overloaded cast operator returning a value of compatible type which, apparently, always represents underlying value of an object or its serialized representation. Then any object which can be converted to e.g. string or int will be replaced by such value when doing a map-related operation. Obviously, there is also need for a compatible constructor to wrap/deserialize the value back to object..

The reason why I did not actually proposed this is because there is a serious downside. Implicit calling of functions and substitution of types and values, which is not as obvious and restricted as e.g. in the case of decorators. If only there was way to make it more apparent...

daokoder commented 9 years ago

This is exactly what I have been considering for some time :)

Interesting convergence of thoughts.

I thought we could actually use an overloaded cast operator returning a value of compatible type which, apparently, always represents underlying value of an object or its serialized representation. Then any object which can be converted to e.g. string or int will be replaced by such value when doing a map-related operation.

I also considered to use something like serialize(), but casting may not be flexible enough. The current serialization support and the proposed signature() do not require the object to be serialized into string or int. The serialized or signature object can be anything that is composed of primitive types and their aggregations.

Obviously, there is also need for a compatible constructor to wrap/deserialize the value back to object..

For serialization, yes, but for what we are trying to do here, no need.

Implicit calling of functions and substitution of types and values, which is not as obvious and restricted as e.g. in the case of decorators.

Implicit calling of functions is inevitable, any approach would require this. There is no need for substitution of types and values, the signature values I mentioned are just "snapshots" of the real object, and they will only be used for comparison and hashing.

Night-walker commented 9 years ago

I also considered to use something like serialize(), but casting may not be flexible enough. The current serialization support and the proposed signature() do not require the object to be serialized into string or int. The serialized or signature object can be anything that is composed of primitive types and their aggregations.

In practice, casting should be sufficient given the meaning of signature here; more importantly, no special signature() is then required -- for instance, time::DateTime and fs::Entry already support conversion to string. Complex (tuple) signature can also be supported via casting overloading, so I don't see necessity for a reserved method.

Obviously, there is also need for a compatible constructor to wrap/deserialize the value back to object.. For serialization, yes, but for what we are trying to do here, no need.

Well, It's indeed not needed if signature is automatically bound to the corresponding object or map entry.

Implicit calling of functions is inevitable, any approach would require this.

Right now, Dao does not call anything declared via a Dao code implicitly. Well serialize() does, but it is expected. For map operations, it maybe somewhat unapparent. Maybe if more uses of this signature can be found, its value will firmly outweigh this shortcoming. By the way, it seems reasonable to oblige the routine returning the signature of an object to be strictly invar to minimize the possibility of unexpected side effects.

dumblob commented 9 years ago

Well, the signature() is quite similar to what I was talking about - a standardized interface for such objects with private method returning a serialized/unique representation and called automatically in maps (and possibly other places like the sort you've mentioned) or explicitly using scalar<>. The signature() seems to be though a generalization of this idea without heavy-weight serialization and it suits me well.

In practice, casting should be sufficient given the meaning of signature here; more importantly, no special signature() is then required -- for instance, time::DateTime and fs::Entry already support conversion to string. Complex (tuple) signature can also be supported via casting overloading, so I don't see necessity for a reserved method.

Then casting has two shortcommings. The less serious one is, that e.g. currently fs::Entry doesn't consider hardlinks when casting. That means, it's error-prone to think of casting as a snapshot. But the more serious one is, that casting would practically bypass type checking as map<BigInt, @T> will become map<string, @T> (which is no more a mapping of big integer to something, but arbitrary string to something).

Night-walker commented 9 years ago

private method returning a serialized/unique representation

I don't see a reason why it has to be private.

(and possibly other places like the sort you've mentioned)

Good idea, sort() may naturally utilize object signatures, relieving the user from the use of sort(){}.

The less serious one is, that e.g. currently fs::Entry doesn't consider hardlinks when casting.

fs::Entry is not a file descriptor/inode/etc. It's just plain path, so its signature won't be any different.

That means, it's error-prone to think of casting as a snapshot.

No magical method cant be guaranteed to return a snapshot, so casting routine is not worse then any other one.

But the more serious one is, that casting would practically bypass type checking as map<BigInt, @T> will become map<string, @T> (which is no more a mapping of big integer to something, but arbitrary string to something).

I don't understand what do you mean. Casting does not bypass anything. It's just a matter of calling (possibly existing) (someType)() instead of special signature(). There is no other difference.

dumblob commented 9 years ago

I don't see a reason why it has to be private.

It doesn't. I was just describing the initial idea of mine.

Good idea, sort() may naturally utilize object signatures, relieving the user from the use of sort(){}.

Why? I use sort(){} for cases where a special treatment is needed (which is actually 100% cases in my code which doesn't perform numerical computations).

No magical method cant be guaranteed to return a snapshot, so casting routine is not worse then any other one.

If we can, we don't want to confuse users. Thinking of casting as a snapshot would mean, that casting will always produce 1:1 relationship between the result and the casted object, which doesn't hold.

I don't understand what do you mean. Casting does not bypass anything. It's just a matter of calling (possibly existing) (someType)() instead of special signature(). There is no other difference.

As I understood @daokoder, the implementation would "track" the notion about signature() in the scalar-like type when used e.g. in a map (i.e. assuming signature() returns string a code m = map<BigInt, @T>); m = {"abc" -> 5} would produce a type-error as "abc" is not a signature of BigInt, but rather a plain string) . If it behaved the same for casting as well, then there is no difference.

Night-walker commented 9 years ago

Why? I use sort(){} for cases where a special treatment is needed (which is actually 100% cases in my code which doesn't perform numerical computations).

Because it can be done and such behavior will be expected if maps adopt the same semantics.

As I understood @daokoder, the implementation would "track" the notion about signature() in the scalar-like type when used e.g. in a map (i.e. assuming signature() returns string a code m = map<BigInt, @T>); m = {"abc" -> 5} would produce a type-error as "abc" is not a signature of BigInt, but rather a plain string) . If it behaved the same for casting as well, then there is no difference.

There is no real difference whether the signature is obtained calling this or that method. Overloaded casting can simply be expected to already be available for types/classes of interest, so why not use it instead of declaring yet another, most likely duplicate, method.

daokoder commented 9 years ago

No magical method cant be guaranteed to return a snapshot, so casting routine is not worse then any other one.

There is no real difference whether the signature is obtained calling this or that method. Overloaded casting can simply be expected to already be available for types/classes of interest, so why not use it instead of declaring yet another, most likely duplicate, method.

It is not worse, but less convenient. With casting, it will be necessary to check casting to int, float, string, tuple and array at least. But this could be easily handled, I think.

daokoder commented 9 years ago

There is no real difference whether the signature is obtained calling this or that method. Overloaded casting can simply be expected to already be available for types/classes of interest, so why not use it instead of declaring yet another, most likely duplicate, method.

For overloaded casting, there is another issue of choosing which one. Between int and string, int is probably more preferable, but between tuple and other types, tuple may be more preferable, as it is probably closer to the original data. Such guessing seems arbitrary, may we should just choose the simplest or the most complicated type casting.

Then there is another issue, program may behavior differently, if one adds a new casting method to a class. Such change of behavior could simply be unexpected.

daokoder commented 9 years ago

There is yet another issue with casting, the result of casting may not be able to fully represent the original object. An example is BigInt, where casting int is necessarily supported, but using such casting for map keys would be problematic. The right casting to use for BigInt would be a string (or an array), but how can it be known arbitrary types. Choosing the more complicated casting seems like a safer bet though, but may be a less efficient one.

Night-walker commented 9 years ago

You are right, the problem of multiple available overloaded cast operators demands a separate, specific method. Maybe it should better be declared as some kind of an operator to prevent the case when a non-related method occasionally given the same name leads to undesirable behavior.

Night-walker commented 9 years ago

By the way, it makes sense to utilize signature for any operation which involves comparison. For instance, x in alist. That probably implies it should be integrated into DaoValue_Compare() which should then be used for all @T value comparison.

dumblob commented 9 years ago

Because it can be done and such behavior will be expected if maps adopt the same semantics.

sort() and sort(){} are two different things and will remain so disregarding signature(). By special cases I meant e.g. sorting list<tuple<x:int, y:string, z:int>> only according to e.g. a combination of .y and .z or whatever crazy criteria you'll come up with including calling external unpredictable and/or non-linear routines etc.

Regarding operator, I'm not convinced it's necessary as it would be actually only to emphasize a call to a certain method and thus redundant. It would though make one thing clear, namely what will happen if signature() is not defined.

daokoder commented 9 years ago

You are right, the problem of multiple available overloaded cast operators demands a separate, specific method. Maybe it should better be declared as some kind of an operator to prevent the case when a non-related method occasionally given the same name leads to undesirable behavior.

In the end, I decided to still use casting methods, but with additional parameters, something like:

routine (int)(invar self: DateTime, mode: enum<hashkey> )

The preliminary implementation is done. The examples here should work. You can also have a look at: https://github.com/daokoder/dao/blob/master/demo/user_map_key.dao

By the way, it makes sense to utilize signature for any operation which involves comparison. For instance, x in alist. That probably implies it should be integrated into DaoValue_Compare() which should then be used for all @T value comparison.

This can be done. But there will be a few issues. The main issue is still mutability. Because it will be difficult to know if an object is used as map key or not. So it is only safe to generate its signature once. This is not a problem for immutable types or objects used as immutable variables. But for other types and objects, it will be unusable for comparisons if the signature for each object is generated only once because the comparison may be based on outdated signatures.

dumblob commented 9 years ago

You mean onle one routine with a signature (ops, term collision :)) routine (any_of_built-in_scalar_types)( invar self: TypeOfSelf, mode: enum<hashkey> )will be allowed for a class (considering also mixins)?

Night-walker commented 9 years ago

sort() and sort(){} are two different things and will remain so disregarding signature().

Certainly. But sort() implies comparison of objects which should naturally involve object signature just as for maps.

You mean onle one routine with a signature (ops, term collision :)) routine (any_of_built-in_scalar_types)( invar self: TypeOfSelf, mode: enum )will be allowed for a class (considering also mixins)?

Interface (virtual) methods could possibly be taken into account in this regard.

Night-walker commented 9 years ago

In the end, I decided to still use casting methods, but with additional parameters, something like:

routine (int)(invar self: DateTime, mode: enum )

I think hashkey does not properly describe what it is. This value is not just for hashes, at least tree maps can use too. And list.sort(), which has no relation to hashes at all.

Maybe we could use a bit different approach:

routine (int)(invar self: DateTime, signature = true)

signature = true could be included into an ordinary cast routine to specify that it should be regarded as signature producer.

The main issue is still mutability. Because it will be difficult to know if an object is used as map key or not. So it is only safe to generate its signature once.

It is an issue in any case anyway. It is simply inconsistent that in for maps and for lists will work in completely different manner -- then the whole idea becomes a hack with questionable benefit. As I already pointed out, if the one and same use implies different semantics in different contexts, it's a mess. No way it can be simple and safe to work with. Without both technical and conceptual consistency, this idea is not viable in my opinion.

Night-walker commented 9 years ago

Mutability can indeed be a serious problem. I think it may make sense to require scalar-like type to be fully immutable by ensuring that all of its methods are invar. BigInt does not contain non-invar methods (even though invar is not used), DateTime can be easily modified to be immutable. fs::Entry cannot be altered this way, but I doubt it should be considered a scalar value anyway.

dumblob commented 9 years ago

Certainly. But sort() implies comparison of objects which should naturally involve object signature just as for maps.

Sure. I intially understood relieving the user from the use of sort(){} as something negative, but you meant it in the positive way :)

fs::Entry cannot be altered this way, but I doubt it should be considered a scalar value anyway.

In my opinion it's not scalar as it represents some node in a tree which is in turn meant to be modified so one should expect such nodes to be non-scalar. A help-question to get answer if the particular type is (not) scalar could be "Is the type standalone piece of data or is it a part of some bigger structure?". But e.g. tree leaf nodes (or any other "border" structures) can be questionable as they can be scalars without any notion about their parent. In case of fs::File, there is a high similarity and "connection" to it's parent and as such I wouldn't consider it being scalar either.