Open mschoch opened 2 years ago
cc @hengfeiyang
@hengfeiyang I think the issue is that Segments are shared, so when you mutate storedFieldChunkUncompressed
we have potential data races:
https://github.com/blugelabs/ice/blob/d830a812e60591ce0955fdeabd483f0ebf537ebd/read.go#L46-L47
If we must do it this way, you'll have to protect access to it with a mutex, like the fieldFSTs:
https://github.com/blugelabs/ice/blob/master/segment.go#L52-L53
Alternatively I wonder, can't we arrange this so that we decompress once, and then just reuse it? I'm not sure exactly how that code looks to be safe from races, but it doesn't make sense that we'd ever intentionally decompress the same compressed bytes again right?
Seems like there are 2 choices:
You could also uncompress every time without saving, but pretty sure that isn't a useful option.
Oh I see, it's actually a bigger problem. You only uncompress one chunk at a time, so the contents of that buffer actually changes depending on what documents you try to access.
In that case I think we can't cache/reuse the buffer inside the segment. They are intended to be heavily shared (you could have a hundred queries all hitting that segment at one time), so I don't think it makes sense to share that buffer and coordinate access with a lock.
And that makes it seem like the current API isn't going to work very well with the stored docs compressed in chunks. I believe that was the reason we compressed docs individually in the past, even though it results in much lower compression. So, I think if you want to go with this storage format, you should also propose an API to access it efficiently.
I prototyped one idea here: https://github.com/blugelabs/ice/pull/15 But I don't love it.
I prototyped one idea here: blugelabs/ice#15 But I don't love it.
I modify Mutex
to RWMutex
, This will be slight, but it would can cause problem when load many many segments, need a mechanism to release unused cache.
Started seeing a bunch of races detected now as well: