Closed nelio2k closed 5 years ago
Hey @nelio2k , Can you elaborate a bit more on the problem this is solving? I was looking at the example code and what not, but am having trouble seeing what it enables that was not previously possible. Cheers, Brett
This is more of a specific use case of doing an op on the key of an array or an object in a JSON, without referring to a specific JSON field underneath the key.
Given the following JSON:
{ "objKey": { "key":"value" } }
Before the PR, the code does the following:
keyString
:
and subsequently {
key
and value
.Normally that's fine if users do ops on key
within objKey
, but if the "objKey" itself has ops, they don't get executed, and becomes lost as gojsonsm essentially "steps over" the "objKey" and the operations in its corresponding ExecNode.
The example that I found was if say an expression of EXISTS(objKey)
.
Hope that makes sense?
Hi @brett19. Any thoughts on this?
A nil after token is acceptable, it is used to indicate that no after blocks exist. Nil is faster to detect than any other type of value.
Cheers, Brett
On Thu., May 9, 2019, 3:54 p.m. ysui6888, notifications@github.com wrote:
@ysui6888 commented on this pull request.
In fastMatcher.go https://github.com/couchbaselabs/gojsonsm/pull/105#discussion_r282695321 :
@@ -625,6 +626,33 @@ func (m *FastMatcher) matchExec(token tokenType, tokenData []byte, tokenDataLen return nil }
+// If any operation is done strictly on the key of an obj or array, this method takes care of it +func (m FastMatcher) matchKeyOfObj(token tokenType, tokenData []byte, node ExecNode) (error, bool) {
- lastArrOrMapKeyByteLen := len(m.lastArrOrMapKeyBytes)
- if lastArrOrMapKeyByteLen == 0 {
- return nil, false
- }
- var savedKey []byte = m.lastArrOrMapKeyBytes
- m.lastArrOrMapKeyBytes = nil
- if len(node.Ops) == 0 || (token != tknObjectStart && token != tknArrayStart) {
- return nil, false
- }
- // Since this matchExec should only execute on the specific key
- // temporarily don't do any afters, since they are out of scope
i take it that we should not set node.after to nil, then?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/couchbaselabs/gojsonsm/pull/105#discussion_r282695321, or mute the thread https://github.com/notifications/unsubscribe-auth/AAML46ZJWSMCASAPVQTK5VTPUSTSBANCNFSM4HKOTNPA .
I also don't like the idea of having the After temporarily rewritten. It also convolutes the logic of processing, and causes jsonsm to need to iterate over a key numerous times. We should be able to simply perform the Ops in the array and object branches of matchExec after its already processed it as an array/object. This will also have the benefit of likely already knowing the bounds of the underlying JSON value for parsing, and in cases such as 'EXISTS', it doesn't even need to parse anyways.
I've updated it by removing the afternode modification, and added comparison on the Execnode of the object or array itself.
@brett19 I couldn't find your latest comment on this PR. I did forget to run the benchmark test.
With regards to your recommendation of checking for ops to process... interestingly though, when I added the following check:
The performance actually and consistently got worse.
Current PR code without the len() check above:
neil.huang@NeilsMacbookPro:~/source/couchbase/godeps/src/github.com/couchbase/gojsonsm$ go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/couchbase/gojsonsm
BenchmarkBinTree-8 20000000 70.6 ns/op
BenchmarkParseString-8 100000000 10.8 ns/op
BenchmarkParseBigString-8 100000000 10.9 ns/op
BenchmarkParseEscString-8 30000000 52.5 ns/op
BenchmarkParseBigEscString-8 10000000 172 ns/op
BenchmarkParseInteger-8 100000000 12.3 ns/op
BenchmarkParseNumber-8 30000000 47.7 ns/op
BenchmarkParseNull-8 200000000 7.06 ns/op
BenchmarkParseTrue-8 200000000 7.01 ns/op
BenchmarkParseFalse-8 200000000 8.41 ns/op
BenchmarkTokenize-8 30000 44433 ns/op 351.67 MB/s
PASS
ok github.com/couchbase/gojsonsm 18.548s
With checking len(node.Ops) before creating Fastval:
neil.huang@NeilsMacbookPro:~/source/couchbase/godeps/src/github.com/couchbase/gojsonsm$ go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/couchbase/gojsonsm
BenchmarkBinTree-8 20000000 74.5 ns/op
BenchmarkParseString-8 100000000 11.5 ns/op
BenchmarkParseBigString-8 100000000 11.5 ns/op
BenchmarkParseEscString-8 30000000 55.3 ns/op
BenchmarkParseBigEscString-8 10000000 183 ns/op
BenchmarkParseInteger-8 100000000 13.7 ns/op
BenchmarkParseNumber-8 30000000 53.0 ns/op
BenchmarkParseNull-8 200000000 7.74 ns/op
BenchmarkParseTrue-8 200000000 7.77 ns/op
BenchmarkParseFalse-8 200000000 9.00 ns/op
BenchmarkTokenize-8 30000 52628 ns/op 296.91 MB/s
PASS
ok github.com/couchbase/gojsonsm 20.218s
I'm trying to think of reasons why... the only things I can think of are:
Do you or @ysui6888 have any ideas on why as is it's better performant? Anything else that concerns you?
Why not try checking if node.Ops == nil
?
True. I thought the slice references would be made at ExecNode creation. Nil check seems to perform better than the len check, but still lags behind no-check.
Nil check:
neil.huang@NeilsMacbookPro:~/source/couchbase/godeps/src/github.com/couchbase/gojsonsm$ go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/couchbase/gojsonsm
BenchmarkBinTree-8 20000000 67.0 ns/op
BenchmarkParseString-8 100000000 10.5 ns/op
BenchmarkParseBigString-8 100000000 10.4 ns/op
BenchmarkParseEscString-8 30000000 49.6 ns/op
BenchmarkParseBigEscString-8 10000000 163 ns/op
BenchmarkParseInteger-8 100000000 11.1 ns/op
BenchmarkParseNumber-8 30000000 47.0 ns/op
BenchmarkParseNull-8 200000000 7.14 ns/op
BenchmarkParseTrue-8 200000000 7.08 ns/op
BenchmarkParseFalse-8 200000000 8.09 ns/op
BenchmarkTokenize-8 30000 47012 ns/op 332.38 MB/s
PASS
ok github.com/couchbase/gojsonsm 18.131s
No check:
neil.huang@NeilsMacbookPro:~/source/couchbase/godeps/src/github.com/couchbase/gojsonsm$ go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/couchbase/gojsonsm
BenchmarkBinTree-8 20000000 68.5 ns/op
BenchmarkParseString-8 100000000 10.8 ns/op
BenchmarkParseBigString-8 100000000 10.7 ns/op
BenchmarkParseEscString-8 20000000 51.4 ns/op
BenchmarkParseBigEscString-8 10000000 167 ns/op
BenchmarkParseInteger-8 100000000 11.7 ns/op
BenchmarkParseNumber-8 30000000 45.8 ns/op
BenchmarkParseNull-8 200000000 7.00 ns/op
BenchmarkParseTrue-8 200000000 6.99 ns/op
BenchmarkParseFalse-8 200000000 8.09 ns/op
BenchmarkTokenize-8 30000 42739 ns/op 365.61 MB/s
PASS
ok github.com/couchbase/gojsonsm 17.589s
It's very odd... can't think of anything that causes this...
Very odd indeed... Can you show me the code?
All I did was wrap the code in question around
if node.Ops != nil
neil.huang@NeilsMacbookPro:~/source/couchbase/godeps/src/github.com/couchbase/gojsonsm$ git diff
diff --git a/fastMatcher.go b/fastMatcher.go
index 9feeff4..6e22e5e 100644
--- a/fastMatcher.go
+++ b/fastMatcher.go
@@ -568,15 +568,17 @@ func (m *FastMatcher) matchExec(token tokenType, tokenData []byte, tokenDataLen
}
objEndPos := m.tokens.Position()
- objFastVal := NewObjectFastVal(m.tokens.data[objStartPos:objEndPos])
- for _, op := range node.Ops {
- err := m.matchOp(&op, &objFastVal)
- if err != nil {
- return err
- }
+ if node.Ops != nil {
+ objFastVal := NewObjectFastVal(m.tokens.data[objStartPos:objEndPos])
+ for _, op := range node.Ops {
+ err := m.matchOp(&op, &objFastVal)
+ if err != nil {
+ return err
+ }
- if m.buckets.IsResolved(0) {
- return nil
+ if m.buckets.IsResolved(0) {
+ return nil
+ }
}
}
} else if token == tknArrayStart {
@@ -620,15 +622,17 @@ func (m *FastMatcher) matchExec(token tokenType, tokenData []byte, tokenDataLen
}
arrayEndPos := m.tokens.Position()
- arrayFastVal := NewArrayFastVal(m.tokens.data[arrayStartPos:arrayEndPos])
- for _, op := range node.Ops {
- err := m.matchOp(&op, &arrayFastVal)
- if err != nil {
- return err
- }
+ if node.Ops != nil {
+ arrayFastVal := NewArrayFastVal(m.tokens.data[arrayStartPos:arrayEndPos])
+ for _, op := range node.Ops {
+ err := m.matchOp(&op, &arrayFastVal)
+ if err != nil {
+ return err
+ }
- if m.buckets.IsResolved(0) {
- return nil
+ if m.buckets.IsResolved(0) {
+ return nil
+ }
}
}
} else {
ping @brett19
The problem is that when we're iterating through, we lose track of the key of an array or key of an object, so any distinct op nodes on that key were never executed. The proposed solution address this specific scenario and leaves the others untouched (I believe).