apache / fury

A blazingly fast multi-language serialization framework powered by JIT and zero-copy.
https://fury.apache.org/
Apache License 2.0
2.99k stars 220 forks source link

[LICENSE] update Fury LICENSE to list all the licenses for source files where we've borrowed non-Fury code #1274

Open pjfanning opened 8 months ago

pjfanning commented 8 months ago

https://github.com/apache/incubator-fury/blob/e0ba03b2f7462a63ee102d42fafce83790a7a392/licenserc.toml seems to have a list of source files that have contain code that is not originally from Fury.

These licenses will need to be acknowledged in the Fury license. And the Fury NOTICE may also need to be updated if some of the code comes from ASL licensed projects that have their own NOTICE files. We still need to do this even if the other code is in ASF projects (eg Arrow and Flink).

We may also need to update some source file headers.

chaokunyang commented 8 months ago

@pjfanning We removed such headers and license acknowledgement in #783 , maybe we need to add them back and do more checks

pjfanning commented 8 months ago

@chaokunyang @tisonkun the LICENSE makes no mention of any borrowed code. There appears to be lots of borrowed code in Fury and it should all be mentioned in our LICENSE. This issue has nothing to do with #783.

tisonkun commented 8 months ago

In LICENSE or NOTICE should be OK. In NOTICE like Kvrocks:

In LICENSE like Flink:

I'd prefer to bundle in NOTICE and leave the LICENSE identical to the template.

pjfanning commented 8 months ago

But the Fury NOTICE also makes zero mention of the borrowed code.

https://github.com/apache/incubator-fury/blob/main/NOTICE

With Pekko, we were forced by Incubator team to mention all the borrowed code in our LICENSEs. I will need to track down the references and the rules that forced us to do this. The problem with ASF projects is that not all ASF projects are compliant with the licensing requirements.

In the mean time, we can start updating the NOTICE.

tommyettinger commented 6 months ago

Speaking only for myself, I was quite pleased to see my name in an @author Tommy Ettinger tag in FuryObjectMap. That was part of Kryo initially; some later enhancements to that class were applied to a different repo (jdkgdxds, in its ObjectObjectMap class), but they might not improve performance here, or in Kryo. If you want to pull code from jdkgdxds, be my guest; the nextPowerOfTwo(), place(), and resize() methods are different, for instance.

Noting where code came from can be a hassle, especially in a large project, but I think it's important to do.

Fury is doing very well in my benchmarks, and it's definitely been a nice experience using it. I look forward to seeing it advance in the future!

chaokunyang commented 6 months ago

Hi @tommyettinger , the linear probe based ObjectMap you implement are awesome, thanks for this great work. Tha hash lookup is a time-consuming operations. Especially after we applied codegen speedup, the cost of hash look for class dispatch or reference tracking is considerable. So we used this map to reduce cost. We also tried swisstable, but it seems that we can't implement it efficiently in java. Is there any ideas you know to speed this map further?

tommyettinger commented 6 months ago

Hi @chaokunyang , very nice work on Fury! There's a handful of things I can think of that might speed up maps in your case.

An open-addressing, linear-probing table like this needs a low enough collision rate to be as fast as it can get. If your hashCode() results are known to be fairly-high-quality random numbers, such as for identity hash codes, there's nothing that the map needs to do to mix the hashCode() at all, and place() can use item.hashCode() & mask instead of needing to do the long multiplication and shift. That's only a tiny speedup, but place() may need to be called often. Counterintuitively, a place where a hash table of Class objects might have trouble is how Classes are hashed -- they use Object.hashCode(), so they use the identity hash code. Identity hash codes have been a strangely slow area in some of my code, and I try to avoid them, but that's harder to do with Class. This article was interesting about System.identityHashCode(); it looks like having any calls to it interferes with "biased locking." It also seems like the locking can be an issue even in single-threaded situations. I'm wondering if slowdown in the map could be because the first call to hashCode() on each Class, or other type that uses hashCode() from Object, has to do thread-related cautionary work, and not actually from the map code itself.

As an aside, the mixing in place() might be able to be simpler because Fury doesn't target as many platforms as this did in its original form (in libGDX). Some ideas for mixing without needing 64-bit multiplication might include final int hash = item.hashCode(); return (hash >>> shift) ^ (hash & mask);, or some reversible bitwise operation followed by masking. That could be final int hash = item.hashCode(); return (hash ^ (hash << 5 | hash >>> 27) ^ (hash << 24 | hash >>> 8)) & mask;, or anything else that's fast enough and gets suitable mixing for your key type. Identity hash codes probably only need masking, no mixing needed.

Alternate Map implementations could absolutely do better. You're right that SwissTable doesn't have a good implementation in Java; maybe one isn't possible. I think Robin Hood hashing might be something to look into; Cuckoo hashing is what my code replaced in Kryo and before that in libGDX, and it's somewhat of a precursor to Robin Hood and Hopscotch hashing. I wouldn't recommend Cuckoo hashing unless you're certain you've tested all the corner cases -- the papers that describe it gloss over that it can "fail" without describing just how bad a failure can be... In libGDX, we had a proof of concept that sent about 40 4-character Strings into a Cuckoo-based map, and that forced that implementation to attempt to use more table entries than Java can actually fit in one array (it would have used an array with more than 2 to the 30 items if it could). There are some possible workarounds for Cuckoo hashing, but they would also need to be tested very thoroughly for their worst-case behavior. Robin Hood Hashing is what Rust used before it switched to hashbrown, which I think is a SwissTable variant, though I'm not that familiar with Rust. Robin Hood Hashing is also in most cases very fast and offers good expected-case performance, but one operation was a disaster in at least some versions of Rust; it was very slow to copy one map into another. This blog post discusses that issue in great detail; it's a good read. I think that bug might not surface if a Robin Hood hash table didn't change which member of a hash family it uses between instances. FuryObjectMap doesn't actually uses a hash family currently, so that might be an interesting avenue.

Lastly, the simplest option: adjusting the default load factor might make a big difference. One of my several subclasses of ObjectObjectMap in different places used a specific type of key every time, Coord, which is a 2D point on an int grid. I found that using the Cantor Pairing Function as a hashCode() actually worked very well in that case, but only when the load factor was 0.5f or lower (the default is higher than that). The reason for that probably has to do with the data I was hashing; it was mostly square grids where I wanted to track a value with each Coord and do so in an order, and Cantor won't produce any duplicate results for points in a right isosceles triangle shape, with the size of the triangle determined by how many bits of the hashCode() were used. A load factor of 0.5f meant an extra bit that wouldn't have been used otherwise, and that essentially doubled the side lengths and made the whole square area covered without any collisions. A lower load factor speeds up several operations by lowering the likelihood of a collision, but it also slows down iteration, since empty table slots need to be iterated past more often.

I hope I didn't get too off-track here... I'd try the load factor first, though it may require some tinkering.

chaokunyang commented 6 months ago

Hi @tommyettinger , thanks very much for taking time to share such detailed suggestion, this really helps, and laied a good fundation for future performance optimization. I created an issue for this in #1409 , let's continue the discussion there.