couchbaselabs / gojsonsm

Go implementation of my JSONSM algorithm.
9 stars 7 forks source link

Fixes #90: Perform operations on the key of an object or array #105

Closed nelio2k closed 5 years ago

nelio2k commented 5 years ago

The problem is that when we're iterating through, we lose track of the key of an array or key of an object, so any distinct op nodes on that key were never executed. The proposed solution address this specific scenario and leaves the others untouched (I believe).

brett19 commented 5 years ago

Hey @nelio2k , Can you elaborate a bit more on the problem this is solving? I was looking at the example code and what not, but am having trouble seeing what it enables that was not previously possible. Cheers, Brett

nelio2k commented 5 years ago

This is more of a specific use case of doing an op on the key of an array or an object in a JSON, without referring to a specific JSON field underneath the key. Given the following JSON: { "objKey": { "key":"value" } } Before the PR, the code does the following:

  1. Sees the {
  2. Goes into matchObjAndArr
  3. Gets "objKey", stores it as keyString
  4. Gets : and subsequently {
  5. Execute the matchOp on node.Elem["objKey"], which recursively goes and perform on key and value.

Normally that's fine if users do ops on key within objKey, but if the "objKey" itself has ops, they don't get executed, and becomes lost as gojsonsm essentially "steps over" the "objKey" and the operations in its corresponding ExecNode.

The example that I found was if say an expression of EXISTS(objKey). Hope that makes sense?

nelio2k commented 5 years ago

Hi @brett19. Any thoughts on this?

brett19 commented 5 years ago

A nil after token is acceptable, it is used to indicate that no after blocks exist. Nil is faster to detect than any other type of value.

Cheers, Brett

On Thu., May 9, 2019, 3:54 p.m. ysui6888, notifications@github.com wrote:

@ysui6888 commented on this pull request.

In fastMatcher.go https://github.com/couchbaselabs/gojsonsm/pull/105#discussion_r282695321 :

@@ -625,6 +626,33 @@ func (m *FastMatcher) matchExec(token tokenType, tokenData []byte, tokenDataLen return nil }

+// If any operation is done strictly on the key of an obj or array, this method takes care of it +func (m FastMatcher) matchKeyOfObj(token tokenType, tokenData []byte, node ExecNode) (error, bool) {

  • lastArrOrMapKeyByteLen := len(m.lastArrOrMapKeyBytes)
  • if lastArrOrMapKeyByteLen == 0 {
  • return nil, false
  • }
  • var savedKey []byte = m.lastArrOrMapKeyBytes
  • m.lastArrOrMapKeyBytes = nil
  • if len(node.Ops) == 0 || (token != tknObjectStart && token != tknArrayStart) {
  • return nil, false
  • }
  • // Since this matchExec should only execute on the specific key
  • // temporarily don't do any afters, since they are out of scope

i take it that we should not set node.after to nil, then?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/couchbaselabs/gojsonsm/pull/105#discussion_r282695321, or mute the thread https://github.com/notifications/unsubscribe-auth/AAML46ZJWSMCASAPVQTK5VTPUSTSBANCNFSM4HKOTNPA .

brett19 commented 5 years ago

I also don't like the idea of having the After temporarily rewritten. It also convolutes the logic of processing, and causes jsonsm to need to iterate over a key numerous times. We should be able to simply perform the Ops in the array and object branches of matchExec after its already processed it as an array/object. This will also have the benefit of likely already knowing the bounds of the underlying JSON value for parsing, and in cases such as 'EXISTS', it doesn't even need to parse anyways.

nelio2k commented 5 years ago

I've updated it by removing the afternode modification, and added comparison on the Execnode of the object or array itself.

nelio2k commented 5 years ago

@brett19 I couldn't find your latest comment on this PR. I did forget to run the benchmark test.

With regards to your recommendation of checking for ops to process... interestingly though, when I added the following check:

The performance actually and consistently got worse.

Current PR code without the len() check above:

neil.huang@NeilsMacbookPro:~/source/couchbase/godeps/src/github.com/couchbase/gojsonsm$ go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/couchbase/gojsonsm
BenchmarkBinTree-8              20000000            70.6 ns/op
BenchmarkParseString-8          100000000           10.8 ns/op
BenchmarkParseBigString-8       100000000           10.9 ns/op
BenchmarkParseEscString-8       30000000            52.5 ns/op
BenchmarkParseBigEscString-8    10000000           172 ns/op
BenchmarkParseInteger-8         100000000           12.3 ns/op
BenchmarkParseNumber-8          30000000            47.7 ns/op
BenchmarkParseNull-8            200000000            7.06 ns/op
BenchmarkParseTrue-8            200000000            7.01 ns/op
BenchmarkParseFalse-8           200000000            8.41 ns/op
BenchmarkTokenize-8                30000         44433 ns/op     351.67 MB/s
PASS
ok      github.com/couchbase/gojsonsm   18.548s

With checking len(node.Ops) before creating Fastval:

neil.huang@NeilsMacbookPro:~/source/couchbase/godeps/src/github.com/couchbase/gojsonsm$ go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/couchbase/gojsonsm
BenchmarkBinTree-8              20000000            74.5 ns/op
BenchmarkParseString-8          100000000           11.5 ns/op
BenchmarkParseBigString-8       100000000           11.5 ns/op
BenchmarkParseEscString-8       30000000            55.3 ns/op
BenchmarkParseBigEscString-8    10000000           183 ns/op
BenchmarkParseInteger-8         100000000           13.7 ns/op
BenchmarkParseNumber-8          30000000            53.0 ns/op
BenchmarkParseNull-8            200000000            7.74 ns/op
BenchmarkParseTrue-8            200000000            7.77 ns/op
BenchmarkParseFalse-8           200000000            9.00 ns/op
BenchmarkTokenize-8                30000         52628 ns/op     296.91 MB/s
PASS
ok      github.com/couchbase/gojsonsm   20.218s

I'm trying to think of reasons why... the only things I can think of are:

  1. Creating a NewFastValObject is cheap ... we're really just creating slice references to existing memory and perhaps golang has got it optimized.
  2. Perhaps the len() call really is as expensive as we have thought, and thus why we tried to remove the len() calls previously in gojsonsm and goxdcr.

Do you or @ysui6888 have any ideas on why as is it's better performant? Anything else that concerns you?

brett19 commented 5 years ago

Why not try checking if node.Ops == nil?

nelio2k commented 5 years ago

True. I thought the slice references would be made at ExecNode creation. Nil check seems to perform better than the len check, but still lags behind no-check.

Nil check:

neil.huang@NeilsMacbookPro:~/source/couchbase/godeps/src/github.com/couchbase/gojsonsm$ go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/couchbase/gojsonsm
BenchmarkBinTree-8              20000000            67.0 ns/op
BenchmarkParseString-8          100000000           10.5 ns/op
BenchmarkParseBigString-8       100000000           10.4 ns/op
BenchmarkParseEscString-8       30000000            49.6 ns/op
BenchmarkParseBigEscString-8    10000000           163 ns/op
BenchmarkParseInteger-8         100000000           11.1 ns/op
BenchmarkParseNumber-8          30000000            47.0 ns/op
BenchmarkParseNull-8            200000000            7.14 ns/op
BenchmarkParseTrue-8            200000000            7.08 ns/op
BenchmarkParseFalse-8           200000000            8.09 ns/op
BenchmarkTokenize-8                30000         47012 ns/op     332.38 MB/s
PASS
ok      github.com/couchbase/gojsonsm   18.131s

No check:

neil.huang@NeilsMacbookPro:~/source/couchbase/godeps/src/github.com/couchbase/gojsonsm$ go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/couchbase/gojsonsm
BenchmarkBinTree-8              20000000            68.5 ns/op
BenchmarkParseString-8          100000000           10.8 ns/op
BenchmarkParseBigString-8       100000000           10.7 ns/op
BenchmarkParseEscString-8       20000000            51.4 ns/op
BenchmarkParseBigEscString-8    10000000           167 ns/op
BenchmarkParseInteger-8         100000000           11.7 ns/op
BenchmarkParseNumber-8          30000000            45.8 ns/op
BenchmarkParseNull-8            200000000            7.00 ns/op
BenchmarkParseTrue-8            200000000            6.99 ns/op
BenchmarkParseFalse-8           200000000            8.09 ns/op
BenchmarkTokenize-8                30000         42739 ns/op     365.61 MB/s
PASS
ok      github.com/couchbase/gojsonsm   17.589s

It's very odd... can't think of anything that causes this...

brett19 commented 5 years ago

Very odd indeed... Can you show me the code?

nelio2k commented 5 years ago

All I did was wrap the code in question around if node.Ops != nil

neil.huang@NeilsMacbookPro:~/source/couchbase/godeps/src/github.com/couchbase/gojsonsm$ git diff
diff --git a/fastMatcher.go b/fastMatcher.go
index 9feeff4..6e22e5e 100644
--- a/fastMatcher.go
+++ b/fastMatcher.go
@@ -568,15 +568,17 @@ func (m *FastMatcher) matchExec(token tokenType, tokenData []byte, tokenDataLen
                }
                objEndPos := m.tokens.Position()

-               objFastVal := NewObjectFastVal(m.tokens.data[objStartPos:objEndPos])
-               for _, op := range node.Ops {
-                       err := m.matchOp(&op, &objFastVal)
-                       if err != nil {
-                               return err
-                       }
+               if node.Ops != nil {
+                       objFastVal := NewObjectFastVal(m.tokens.data[objStartPos:objEndPos])
+                       for _, op := range node.Ops {
+                               err := m.matchOp(&op, &objFastVal)
+                               if err != nil {
+                                       return err
+                               }

-                       if m.buckets.IsResolved(0) {
-                               return nil
+                               if m.buckets.IsResolved(0) {
+                                       return nil
+                               }
                        }
                }
        } else if token == tknArrayStart {
@@ -620,15 +622,17 @@ func (m *FastMatcher) matchExec(token tokenType, tokenData []byte, tokenDataLen
                }
                arrayEndPos := m.tokens.Position()

-               arrayFastVal := NewArrayFastVal(m.tokens.data[arrayStartPos:arrayEndPos])
-               for _, op := range node.Ops {
-                       err := m.matchOp(&op, &arrayFastVal)
-                       if err != nil {
-                               return err
-                       }
+               if node.Ops != nil {
+                       arrayFastVal := NewArrayFastVal(m.tokens.data[arrayStartPos:arrayEndPos])
+                       for _, op := range node.Ops {
+                               err := m.matchOp(&op, &arrayFastVal)
+                               if err != nil {
+                                       return err
+                               }

-                       if m.buckets.IsResolved(0) {
-                               return nil
+                               if m.buckets.IsResolved(0) {
+                                       return nil
+                               }
                        }
                }
        } else {
nelio2k commented 5 years ago

ping @brett19