Team-CC-Corp / JVML-JIT

A continuation of JVML which JITs Java bytecode to Lua bytecode rather than interpreting.
MIT License
30 stars 4 forks source link

Implemented HashMap #58

Closed Turakar closed 9 years ago

Turakar commented 9 years ago

I recently implemented a HashMap with native lua tables. To make the map as fast as possible, I'm using kind of a mix of lua and java. The Java objects are stored in buckets in a lua table indexed by the hashes. Every bucket contains a list (numerically-indexed) with the key-value pairs. Two objects are considered equal, if Object.equals returns true.

ghost commented 9 years ago

The reason I didn't do this is because of long, float, and double. Last time I tried getting this in, Yevano said he wouldn't do it without long, double, and float. I hope someone implements those so we can have HashMap because I don't know how to implement them and I really need maps...

ElvishJerricco commented 9 years ago

Few things:

Sorry. I hate posting these demoralizing kinds of things. But this is a big feature you've gotten yourself into, and it demands an advanced solution. That's why we haven't gotten into it yet. While we'd love to tackle it, it would take a lot of effort that Yevano and I have wanted to spend focusing on the VM itself.

We look forward to pull requests like this. It's always nice to see someone implementing something we can't allocate efforts for. But we do still need it to be correct.

ElvishJerricco commented 9 years ago

Whoops. Comment about qualifying as a hash map was wrong. Sorry.

Turakar commented 9 years ago

Your tabs are out of whack. Always use 4 space tabs for this repo

Will change that.

Maybe I'm misunderstanding. Why are buckets needed?

  1. You can't store the objects simply in the table indexing by the keys, as the equals() method wouldn't be respected in that case.
  2. So I used hashes for indexing.
  3. A hash for two unequal objects may be the same (actually all hashes may be the same, but this would reduce performance), so if I would only reference by the hash, I might be overwriting some pairs.
  4. So you need a bucket, where you can store conflicting pairs. For code simplicity, even non-colliding pairs a stored in their own bucket, as they don't have to be moved later.

What about all the rest of the Map API?

This will be probably implemented in the future by me (in the next weeks I think) or smb else who likes to do that. If you first want to merge once it's done, it's ok for me.

Throwing an exception requires return nil, exception not return exception.

Well I didn't know or tested that (as my equal methods never threw exceptions).

Your tables variable shouldn't be global.

right, that should be changed to local.

Your tables method is also going to leak memory, as it never releases the maps.

Is there some kind of deconstructor or similar like in C++? As I can't bind the table to the this object AFAIK.

Regarding the problems with long, double and float I don't see a real big problem, as these are primitves not often used for HashMap's keys. I did actually really wrote a implementation for long, but it doesn't work because of the shift operators not implemented. Refer to #57 for this.

Yevano commented 9 years ago

Is there some kind of deconstructor or similar like in C++? As I can't bind the table to the this object AFAIK.

I'm thinking a weak table might remedy this issue. In theory, it should allow the objects to be collected even though the reference is still in the table as long as no references exist elsewhere.

It would be nice if we could have destructors, but unfortunately the __gc metamethod doesn't work for tables in Lua 5.1. I wonder if a hack could be done to hook onto destruction using a weak table.

ElvishJerricco commented 9 years ago

Weak tables are disabled in CC I'm pretty sure

ElvishJerricco commented 9 years ago

Also weak only works for keys, not values.

Yevano commented 9 years ago

It works for both keys and values. You just set the mode in the metatable. Also, do you know for a fact that they are disabled? I don't see why they would be, but I don't really have a (simple) way of testing since collectgarbage is not included.

ElvishJerricco commented 9 years ago

Maybe I'm wrong. I remember having a bunch of problems with weak tables but that was a loooong time ago

Yevano commented 9 years ago

I guess that's this?

ElvishJerricco commented 9 years ago

Hm yea. It seems I was under several misconceptions at that time. My bad.

Turakar commented 9 years ago

Weak references don't seem to work, although I can't says this for sure, as there is no way you can force a garbage collection in CC.

I tried to make the tables table with weak references in key mode. So as soon as there is no refrence to the HashMap anymore, the table should be collected. However, this doesn't seem to work. I tried to force a garbage collection by just filling an table with useless values to a specific amount (this was an try'n'error-thingie).

Yevano commented 9 years ago

I tried that too. Did you look at the memory usage of the Java application? When I tried it, the memory usage never went down, and I couldn't be bothered running it for long enough to get anywhere close to the limit. It may actually work, but the gc just won't run unless it really needs to.

Turakar commented 9 years ago

I tried to overflood the memory in lua (implemented a native mthod for this) by just filling an useless table. Anyway it didn't work.

I think a destructor is somehow necessary...

ElvishJerricco commented 9 years ago

This code demonstrates weak tables working for me. Remember that the GC can take up to a few minutes to kick in.

local t = setmetatable({}, {__mode="kv"})
while true do
    t[#t + 1] = {}
    print(#t)
    sleep(.2)
end

Every ~7 seconds or so, the table got flushed for me.

ElvishJerricco commented 9 years ago

Although I think I've ignored a rather pressing issue. Why on earth does this need a native solution?

Here's a solution I've not tested beyond making sure it compiles

// Map.java
package java.util;

public interface Map<K, V> {
    interface Entry<K, V> {
        K getKey();

        V getValue();

        V setValue(V value);
    }

    V put(K key, V value);
    V get(K key);
    V remove(K key);
}
// HashMap.java
package java.util;

import java.util.ArrayList;

public class HashMap<K, V> implements Map<K, V> {
    private ArrayList<Bucket> buckets = new ArrayList<>();

    public V put(K key, V value) {
        int hash = key.hashCode();
        Bucket bucket = getBucket(hash);
        if (bucket == null) {
            bucket = new Bucket(hash);
            buckets.add(bucket);
        }

        Pair pair = bucket.getPair(key);
        if (pair == null) {
            pair = new Pair(key, value);
            bucket.pairs.add(pair);
            return null;
        } else {
            V old = pair.getValue();

            if (value == null) {
                bucket.pairs.remove(pair);
                if (bucket.pairs.size() == 0) {
                    buckets.remove(bucket);
                }
            } else {
                pair.setValue(value);
            }

            return old;
        }
    }

    public V get(K key) {
        Bucket bucket = getBucket(key.hashCode());
        if (bucket == null) {
            return null;
        }

        Pair pair = bucket.getPair(key);
        if (pair == null) {
            return null;
        }

        V value = pair.getValue();
        return value;
    }

    public V remove(K key) {
        Bucket bucket = getBucket(key.hashCode());
        if (bucket == null) {
            return null;
        }

        Pair pair = bucket.getPair(key);
        if (pair == null) {
            return null;
        }

        bucket.pairs.remove(pair);
        if (bucket.pairs.size() == 0) {
            buckets.remove(bucket);
        }

        return pair.getValue();
    }

    private Bucket getBucket(int hash) {
        for (Bucket bucket : buckets) {
            if (bucket.hash == hash) {
                return bucket;
            }
        }
        return null;
    }

    private final class Bucket {
        final int hash;

        ArrayList<Pair> pairs = new ArrayList<>();

        Bucket(int hash) {
            this.hash = hash;
        }

        Pair getPair(K key) {
            for (Pair pair : pairs) {
                if (pair.getKey().equals(key)) {
                    return pair;
                }
            }
            return null;
        }
    }

    private final class Pair implements Map.Entry<K, V> {
        K key;
        V value;

        Pair(K key, V value) {
            this.key = key;
            this.value = value;
        }

        public K getKey() {
            return key;
        }

        public V getValue() {
            return value;
        }

        public V setValue(V value) {
            this.value = value;
            return value;
        }
    }
}
Turakar commented 9 years ago
  1. I didn't mean weak tables aren't working in general (they actually work for me in a simple test). What I meant was that the suggested solution with the tables table to be weak didn't work for me.
  2. In your example you have to loop through the table every time you want to get a bucket. In the native solution I have implemented the native performance gain got from using the lua tables is used. If you are already looping through the table, you don't have to use hashes. Instead you may call the equals method directly.
Turakar commented 9 years ago

Ok, I found a way to do what we need. It is actually what I tried to do before, but this time I'm using the GC-"Force"-code from ElvishJericco (an adapted version). This is how my tryGC() method looks like:

write("trying to cause gc... ")
local t = setmetatable({}, {__mode="kv"})
local count = 0
repeat
    t[#t + 1] = {}
    count = #t
    sleep(0.2)
until #t == 0
sleep(1)
print("did " .. count .. " cycles")

My test was to create a function which creates a HashMap, does some stuff and exits after that. I called this function twice and after that I printed wtih the help of a debugging function the tables in memory. 0.

PS: There's something which wonders me for a time: How am I supposed to add 'jvml' to the shell path on a CC pc as recommended by grin?

ElvishJerricco commented 9 years ago

I'm not sure that looping is significantly slower than what Lua does behind the scenes when you index a table that way. @Yevano what are your thoughts on that?

Turakar commented 9 years ago

I would guess that CC (or LuaJ or whatever is used behind the scenes) uses a HashMap. HashMaps are normally using arrays, although they are doing this in a somehow with hashes optimised way (Have a look at the entrys array in the debugger in normal java code, it is actually quite interesting). However in short this means that we'll probably not be able to implement such a good and optimised HashMap as in the normal Java Runtime Library. So by using this table thingie, I want to use this fast HashMap as good as possible.

This is also interesting: http://java.dzone.com/articles/hashmap-internal

Yevano commented 9 years ago

Yes, like Turakar said, using Lua tables should be much faster than writing our own search algorithm since the underlying implementation is written in the host JVM. I think whenever we're going for speed, we should always try to take advantage of the native algorithms before implementing it in Java.

ElvishJerricco commented 9 years ago

Hm. Maybe we should make some kind of benchmark for these things? Test if ArrayList and hasmap really need native solutions?

Yevano commented 9 years ago

I suppose we could. I don't think it's necessary though. The JVM host solution should almost always be faster I think.

ElvishJerricco commented 9 years ago

I just don't feel like the hashes are necessary for a native implementation. It's superfluous when we could just have a table that literally uses the keys as keys. And in that case, it's hardly a hash map.

oldmud0 commented 9 years ago

I'm sorry, but I keep reading "native implementation" as "naive implementation". :laughing:

Turakar commented 9 years ago

But if you don't use hashes, you don't have an identifier for an object. Thus you can't use the native speedup and always have to loop through a array (which is btw a looot of work if you talk about more elements. Up to a 100 is ok, but think about 100000).

ElvishJerricco commented 9 years ago

What? I think you misunderstood. If we have a native implementation, where we can use just Lua stuff, why use a hash key to find a bucket to look through when we can literally just use the key object as a key? Basically replace

local bucket = tables[this][hash] -- then search the bucket for the equal key
...                               -- lots of work
return value

with a single line

return tables[this][key]
ElvishJerricco commented 9 years ago

Oh I realized immediately after posting that. The reason is that we don't index by key itself, but by finding an object that .equals(key), which can be different. My bad.

Turakar commented 9 years ago

Ok, so I've added the clear() and entryArray() methods. Please note that the entryArray() method is different from the entrySet() method in the normal java HashMap by returning an array and not a Set.

Yevano commented 9 years ago

Any reason not to return a set? In the end, we want to it to be close to the original Java library anyways.

Turakar commented 9 years ago

As there is no Set.get() method, we would have to implement Iterators.

ElvishJerricco commented 9 years ago

I mean isn't Set supposed to implement Collection and therefore Iteratable? I see no reason not to do this.

Turakar commented 9 years ago

Ok, got the entrySet() working. Btw, I fixed the clear() method which wasn't working correctly.

ElvishJerricco commented 9 years ago

Looking into merging this after pulling emit-dev into master, which is coming very close.

Turakar commented 9 years ago

I'm looking forward for both things! Will there still be some bytecode commands missing in the new framework? And do you consider writing a (small) doc for it?

ElvishJerricco commented 9 years ago

The changes made in emit-dev should be completely compatible with the old emitter. We do need to write docs for everything though...

ElvishJerricco commented 9 years ago

Well, compatible was the wrong word. But it has all the same functionality. The way you write it is different though.

Turakar commented 9 years ago

Ok. Currently it is hard to get into this whole ASM Framework and JIT stuff, as you don't really know how it works. A page on the wiki would be the best, but as this is your free time, it's up to you.

Turakar commented 9 years ago

I think the next step for me after merging the HashMap is improving the peripheral api with the ability to support lua tables. Do you think we should implement an explicit casting function from HashMap to Lua Table or not? It could make things easier, but it could also make people using tables where they should use objects instead.

ElvishJerricco commented 9 years ago

I think the l2jType function might want to implement the table → hashmap conversion

Turakar commented 9 years ago

Wouldn't this lead to implicit conversion (I'm not good in that LASM stuff)? However, what do you think about merging this PR?

ElvishJerricco commented 9 years ago

Gave this a final look. I think it's fine to pull.