Dicts and strings - Githubissues

chenhsi commented 10 years ago

Two current bugs, as demonstrated by the (as of posting) latest update to the "10 dicts.ambr" test file::

1: Dicts don't seem to be able to using strings as keys correctly, as there should be no reason for the file to error before the last line. However, I can't tell what in the code is causing this problem, so this might be a problem with the test file? 2: When printed out, that strings do not have quotes around them make them hard to distinguish from other non-string keys, e.g. both the int 1 and the string "1" look the same.

In addition, it seems strange to use "length" to describe a dictionary. I know Python uses this terminology, but perhaps we could use "size" instead for both lists/dicts, as per Java Collections?

edgao commented 10 years ago

The issue with using Strings as keys is that StringValues didn't override equals(), so they checked for identity. I'm about to commit a fix for that. I also added a repr() method for Values (reminiscent of Python's repr() method), since I didn't want to just make all Strings print out with quotes. Not sure if we want to implement that differently. For now, the default repr() just returns toString(), and it's overridden in StringValue.

I don't particularly care about the naming - it's mostly out of habit (from String.length() and array.length). That's been changed to size now.

One rather big problem I'm wondering about is how we want to deal with mutable objects as keys for dicts. For example:

key1 = [1, 2, 3]
key2 = [4, 5, 6]
dict = {key1: "asdf", key2: "qwer"}
key1[0] = 4
key1[1] = 5
key1[2] = 6

I don't know how Java deals with this, and Python just sidesteps the issue by not allowing lists to be used as keys. I might open a new issue for this later; what do you think?

edgao commented 10 years ago

Ugh. Just realized that the issue was hashCode, not equals. Fixed in StringValue; probably going to have to manually add hashcodes to every value class.

chenhsi commented 10 years ago

I believe that the way Java maps deal with mutable objects is equals (and hashcode), which for arrays is the == operator. Thus, your example would have a map with both arrays (with identical contents) mapping to the same value. That said, online comments seem to agree on warning against using mutable keys that change hashcode values, as such key-value pairs might be lost in the map/become unaccesible.

I can't think of any instances where I've encountered this problem in Java, so I'm not exactly certain when this bug appears. That said, I can definitely think of instances when I've used mutable keys (e.g. Lists and Maps from Java Collections), and as I'm not sure how we're dealing with equals/hashcode and such currently, I'm not particularly fond of the idea of pushing this potential danger from Java onto our language.

Perhaps one alternative solution is to simply clone any list or any user-defined object before placing into the map as a key, ensuring that the map's keys aren't mutable. I can't recall at the moment how I generally use mutable keys, so I'm uncertain as to how much this would affect my usual coding practices. I'm also uncertain as to how much this would affect runtime, so I'm not sure how reasonable this solution is anyway.

edgao commented 10 years ago

Cloning the key seems like a reasonable solution - I don't think there are going to be situations where we actually want to use ridiculously large keys, so it (probably) won't be a major problem. Although I guess for large dictionaries it would become quite significant.

Java's behavior seems to be that the key with the smaller (?) hashcode overwrites the value of the key with the larger hashcode, although I haven't confirmed that. Obviously, that isn't exactly acceptable, given the nature of hashcodes.

I don't think I really use mutable objects as keys, so I'm fairly ambivalent on this one. For now, I'll just make dictionaries clone the keys, and we can change it later.

chenhsi commented 10 years ago

My understanding of the Java problem was that if the key's hashcode changes, the internal hash bucket that the key maps to changes, and the map will search through the wrong bucket for the value, and not find it, concluding that the object was never a key to the map. Meanwhile, the old value is in the old bucket, and would never get accessed again.

I was thinking that for a compiler optimization, it would be reasonable to automatically determine what variables would never get changed, for example. If an object is detected as immutable (all fields are constant), the map could skip the cloning, and most of the inefficiency would be avoided, if indeed we don't use immutable keys. That said, we're nowhere close to that stage of the project, and the interpreter probably won't be able to do such an optimization.

edgao commented 10 years ago

It doesn't make sense for an interpreter to see if an object is immutable, since it's always possible that the user will later add code that modifies it. Obviously, a compiler could probably manage it.

Also, I just realized that cloning objects/lists/whatever is (probably) going to require deep-copies if we really want to avoid undefined / unreliable / weird behavior (to avoid situations where the user modifies e.g. an object inside the list). Which means that I need to deal with silly things like circular references. Ugh.

chenhsi commented 10 years ago

I don't really expect to be doing that (circular references) frequently, being unused to creating new objects with already existing fields. Hmm, how would constructors work? Maybe some syntactically sugary constructor syntax that helps in syntactically sugary class declarations.

Anyway, I would imagine that dealing with circular references could be done by having a map of old objects to new objects, such that within a single object cloning along with its recursive fields being cloned, any references to the same old object appearing twice would be replaced with the same new object.

edgao commented 10 years ago

It's not a major issue (the only time I really use circular references is when I make GUIs, which I hopefully will never do again), but I don't want to leave this kind of issue undealt-with. The problem I'm facing right now is how exactly I'm supposed to create the clones. Does it make sense for Value to implement Cloneable?

chenhsi commented 10 years ago

I suspect that you deal with circular references more frequently than you think, e.g. 61B involves you implementing LinkedLists. Or maybe I'm the only one who directly does that sort of stuff frequently.

I feel that Cloneable is a reasonable interface to be using. Whether or not Value should be Cloneable would depend on how we want to deal with cloning functions, if at all, tying this back to how methods interplay with classes having mutable function declarations. Otherwise, only ObjectValues need to be Cloneable, while FunctionDeclarations can simply be copied.

Speaking of which, FunctionDeclarations is somewhat of an unwieldy name, and if we aren't having actual class declarations, then I might return FunctionDeclaration to being called FunctionValue.

Anyway, I think that your current strategy/implementation based on your code/comments seems fine; I have only to add my previously mentioned suggestion of using a Map instead of a Set, allowing for easier retrieval.

edgao commented 10 years ago

Well, anything that can contain other Values is going to need to be Cloneable (I think right now that would be List, Dict, and Object). For now, I'm going to have only those three implement the interface; if we decide that needs to change, it won't be too difficult.

And the Map instead of a Set makes perfect sense - thanks for pointing it out.

edgao commented 10 years ago

Ew. clone() is protected. Don't want to deal with casting; Value is going to be Cloneable. I might go into specific Value subclasses later and make their clone() methods simply return themselves.

Also, reflection is making my head explode. I'll figure this out after getting some sleep.

edgao commented 10 years ago

Well, my eyes are practically bleeding, but it works. I swear, every time I use reflection I'm reminded of why I don't use reflection. And I'm willing to bet there are a million bugs waiting to be found. Leaving the issue open for a bit, until we're pretty sure it works for most cases.

As mentioned in the commit message, go to objenesis.org/download.html, download the distribution (first link), and extract objenesis-2.1.jar to your project. It was included in the commit anyway, but I'll be excluding it from future commits (not sure what proper practice on including libraries in repos is for GitHub).

EDIT: Whoa, GitHub actually auto-linked the commit. I approve so hard.

chenhsi commented 10 years ago

Um. I'm fairly confused as to why reflection is necessary. I suppose I'l look into it tomorrow.

Edit: Oh, if the reason is regarding cloning fields, why don't you just implement the clone method within each individual Value class? You do know that you can "overwrite" the protected nature of clone() to be public, right?

edgao commented 10 years ago

I tried doing that, but I couldn't find a way to pass the map of "already-cloned stuff" to clone().

chenhsi commented 10 years ago

Hmm. Write a clone(Map) function and ignore the one in Object?

chenhsi / Ambroscum

Dicts and strings #18