alphadose / haxmap

Fastest and most memory efficient golang concurrent hashmap
MIT License
892 stars 45 forks source link

Major Bug #8

Closed gitperson1980 closed 1 year ago

gitperson1980 commented 1 year ago

I upgraded from v0.1.0 to v0.3.1 and it seems to hang in the set command. The CPU stays stuck at 100% and the application does not run but haxmap internals are the only things running. I did a profiler in this condition and here is the image. When I downgraded v0.1.0, all was ok. Problem appears to exist for anything above v.0.1.0

image

alphadose commented 1 year ago

@gitperson1980 can you also share your benchmarking code so I can debug on my own system ?

gitperson1980 commented 1 year ago

I was not benchmarking but using it in an actual application. I had about 9 thousand entries of string keys with pointer to struct values. I used a combination of sets and gets. However I suspect that this maybe happening when using the haxmap range iterator function even though the above profile shows only set happening. Perhaps a contention when using set with range in a another go routine. I do use concurrent access to haxmap from multiple go routines. When I drop to v0.1.0 all is fine. Anything above that and I experience hangs with CPU utilization at 100%. Almost seems using set and range together from 2 or more concurrent go routines at certain times causes this contention. My suspicion.

alphadose commented 1 year ago

@gitperson1980 understood, one of the major differences between v0.1.0 and v0.3.1 is the map growing asynchronously (in 0.1.0) and synchronously (in 0.3.1) as highlighted by this issue https://github.com/alphadose/haxmap/issues/2

So to narrow down the issues more specifically, in version 0.3.1 itself can you try editing this line directly in your codebase https://github.com/alphadose/haxmap/blob/main/map.go#L158 and change it to go m.grow(0, true) ?

This will tell us whether the bottleneck is happening due to grow operations or something else.

If the problem doesn't lie in the grow operations then I shall test this with 9k+ long string keys in order to replicate the issue.

However I suspect that this maybe happening when using the haxmap range iterator function even though the above profile shows only set happening. Perhaps a contention when using set with range in a another go routine. I do use concurrent access to haxmap from multiple go routines.

I doubt that because range and set have no mutual locks hence they should operate independently. If you look at the code of for_each/range you can see its a simple list traversal calling next().

Although my suspicion is the issue occurs because the next() function calls itself recursively (https://github.com/alphadose/haxmap/blob/main/list.go#L33), too much recursion will definitely slow down the overall program. Let me remove the recursion and publish a new release.

gitperson1980 commented 1 year ago

I will test this out Tuesday after labor day as I will be busy this weekend. I will let you know.

gitperson1980 commented 1 year ago

I found that if I set 9000 items after pre-allocating 10000 items as haxmap.New[string, *mystruct](10000), all is fine even in v0.3.1. However if I only pre-allocate 1 as haxmap.New[string, *mystruct](1) and then rapidly load 9000 items then it will get stuck as above. Didn't have this issue in v0.1.0. In my case I have 8 processors and I am rapidly doing a .set from 10 go routines concurrently.

I did modify v0.3.1 with go m.grow(0, true) before testing as suggested.

Here is the profile:

image

alphadose commented 1 year ago

@gitperson1980 Thanks a lot for the info, its insightful, plus I have already come up with a possible fix for the rapid growth choke issue, let me test that out by replicating your conditions and then get back to you

gitperson1980 commented 1 year ago

@gitperson1980 Thanks a lot for the info, its insightful, plus I have already come up with a possible fix for the rapid growth choke issue, let me test that out by replicating your conditions and then get back to you

One thing I didn't check is .. even if I pre-allocate 10000 and load 9000 and I will delete a chunk at times and load new ones does haxmap keep the initial 10000 pre-allocated spaces if I delete 1000 items? Does it de-allocate them at delete and be forced to re-allocate again if more items are added, thereafter?

alphadose commented 1 year ago

@gitperson1980 on deletion the map deletes the specified block in the linked list but those extra spaces in the index array are just set to nil

the index array only grows in size and doesn't shrink

gitperson1980 commented 1 year ago

One feature that would be nice is to have a growth number or factor specified with the pre-allocation parameter. haxmap.New[string, *mystruct](10000, 100) where 100 is an extra 100 and hence, 101 total, that is allocated if it runs out of spaces at the next .set. This will cut down on additional allocations if done 1x1. Users who know the growth rate can set this to minimize allocations on growth and fine tune performance.

NyaaaWhatsUpDoc commented 1 year ago

Just adding to this, I am also experiencing a lock-up on calling .Set(). This isn't using a huge dataset either.

Usecase is here (near enough, my code un-checked-out code is a bit different): https://codeberg.org/gruf/go-mangler/src/commit/061f658def2247021aa84b5e52148613b199db84/mangle.go#L17

NyaaaWhatsUpDoc commented 1 year ago

@alphadose is there not a possible bug in .Set() due to the resize state not being checked at the beginning? i.e. it loads the current datamap without checking state, but if a grow op is happening concurrently then the datamap it is acting on could be stale?

alphadose commented 1 year ago

@NyaaaWhatsUpDoc

it loads the current datamap without checking state, but if a grow op is happening concurrently then the datamap it is acting on could be stale?

the map can be used safely even during a grow operation hence this is not an issue

alphadose commented 1 year ago

@gitperson1980 @NyaaaWhatsUpDoc @nightwolfz I have fixed this issue with https://github.com/alphadose/haxmap/commit/c46b1873d4de1f280f2ed47b19f8227a72b90752

Can you run the tests on your own application with the latest main branch and check if your application is still freezing up or not ?

NyaaaWhatsUpDoc commented 1 year ago

@NyaaaWhatsUpDoc

it loads the current datamap without checking state, but if a grow op is happening concurrently then the datamap it is acting on could be stale?

the map can be used safely even during a grow operation hence this is not an issue

No what i mean, is saying during a grow operation it allocates a new DataMap, then updates the atomic ptr to point to this new DataMap, but while this is going on the instance of .Set() could have been acting on the previous instance of DataMap. Or does DataMap take account for this under the hood?

alphadose commented 1 year ago

No what i mean, is saying during a grow operation it allocates a new DataMap, then updates the atomic ptr to point to this new DataMap, but while this is going on the instance of .Set() could have been acting on the previous instance of DataMap. Or does DataMap take account for this under the hood?

Yes it does. Both instances of DataMap (old and new) will give the same result. Ultimately all data is stored in a single linked-list.

NyaaaWhatsUpDoc commented 1 year ago

No what i mean, is saying during a grow operation it allocates a new DataMap, then updates the atomic ptr to point to this new DataMap, but while this is going on the instance of .Set() could have been acting on the previous instance of DataMap. Or does DataMap take account for this under the hood?

Yes it does. Both instances of DataMap (old and new) will give the same result. Ultimately all data is stored in a single linked-list.

Ah understood. I might do a dig through the code and properly understand it all at some point.

Will also update my previously-mentioned usecase and report back.

alphadose commented 1 year ago

@gitperson1980 @NyaaaWhatsUpDoc I have published a new release which should solve all your problems https://github.com/alphadose/haxmap/releases/tag/v1.0.0

Tests like this one provided by @nightwolfz which were failing earlier, are now working https://github.com/alphadose/haxmap/blob/main/e2e_test.go#L276

Re-open this issue in case the problem still persists on your end