couchbaselabs / cbft

*THIS PROJECT HAS MOVED* from couchbaselabs TO: https://github.com/couchbase/cbft -- no further development will be done here on couchbaselabs/cbft
Other
27 stars 5 forks source link

0.2.0 cbft full failed to index document with ID " SimpleKeyREP0REP0REP0REP0REP0REP0" #155

Closed weilliu closed 9 years ago

weilliu commented 9 years ago

I use the dataset for our sdk test which inject a bunch of docs with ID equals to "SimpleKeyREPxxx". The sample doc is like

SimpleKeyREP0REP0REP0REP0REP0REP0
{
  "Json": "SimpleValueREP0REP0REP0REP0REP0REP0REP0REP0REP0REP0REP0REP0REP0REP0REP0REP0REP0REP0REP0REP0REP0REP0REP0REP0REP0REP0REP0REP0REP0REP0"
}

There are a couple errors I noticed from cbft-full (0.2.0) on indexing those documents

[root@centos-58 ~]# curl -X POST http://localhost:8095/api/index/test-index/query --data '{"query": {"query": "SimpleKey" , "boost" : 1}}' --header 'Content-Type: application/json' --header 'Accept: application/json'
{"request":{"query":{"query":"SimpleKey","boost":1},"size":0,"from":0,"highlight":null,"fields":null,"facets":null,"explain":false},"hits":[],"total_hits":0,"max_score":0,"took":4308962,"facets":{}}
[root@centos-58 ~]# curl -X GET http://localhost:8095/api/index/default-index/count --header 'Content-Type: application/json' --header 'Accept: application/json'
{"status":"ok","count":514}
[root@centos-58 ~]# ssh root@172.23.107.174
Last login: Thu Jul 16 16:12:50 2015 from 10.17.2.130
[root@master ~]# cd /opt/couchbase/
[root@master couchbase]# bin/cbq
Couchbase query shell connected to http://localhost:8093/ . Type Ctrl-D to exit.
cbq> select count(*) from default;
{
    "requestID": "9ef0041f-5d00-4bcd-9bd1-f8a666adbee1",
    "signature": {
        "$1": "number"
    },
    "results": [
        {
            "$1": 1001
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "20.842947ms",
        "executionTime": "20.723657ms",
        "resultCount": 1,
        "resultSize": 34
    }
}

The log error

2015/07/16 15:42:42 bleve: json.Unmarshal, partition: 920, key: "SimpleKeyREP472REP472REP472REP472", seq: 11, val: "SimpleValueREP472REP472REP472REP472REP472REP472REP472REP472REP472REP472REP472REP472REP472REP472REP472REP472REP472REP472REP472REP472", err: invalid character 'S' looking for beginning of value
steveyen commented 9 years ago

cbft is trying to parse the value as a JSON document, where the value looks like...

SimpleValueREP472REP472REP472REP472REP472REP472REP472REP472REP472REP472REP472REP472REP472REP472REP472REP472REP472REP472REP472REP472

But, that isn't valid JSON, so cbft (correctly) logs an error skips it.

As a feature idea, what might be helpful to users is cbft should probably track an additional stat or counter on how many of these JSON parsing errors it encounters. (I opened up a new issue https://github.com/couchbaselabs/cbft/issues/156 to track this feature idea, and will be closing this one)

Or, perhaps cbft should instead have a different, optional feature to treat that non-JSON string as just a string and index it anyways, but that feels more like an incorrect solution for a JSON document database.