elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.43k stars 24.57k forks source link

String values are stored in float field #3414

Closed shaoweite closed 10 years ago

shaoweite commented 11 years ago

(Version 0.90.0) I have a float field defined in a mapping and I was able to put a string values into the float field as long as the string value can pass java.lang.Float.parserFloat(). Later sorting on this field will result in exception like this:

        at org.elasticsearch.search.SearchService.parseSource(SearchService.java:573)
        at org.elasticsearch.search.SearchService.createContext(SearchService.java:484)
        at org.elasticsearch.search.SearchService.createContext(SearchService.java:469)
        at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:462)
        at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:234)
        at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAc
tion.java:141)
        at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstP
hase(TransportSearchQueryThenFetchAction.java:80)
        at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(Trans
portSearchTypeAction.java:205)
        at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(Trans
portSearchTypeAction.java:192)
        at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTy
peAction.java:178)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:679)
Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException: Facet [latitude]: value_field [loc_lat] isn't a number field, but a string
        at org.elasticsearch.search.facet.termsstats.TermsStatsFacetParser.parse(TermsStatsFacetParser.java:127)
        at org.elasticsearch.search.facet.FacetParseElement.parse(FacetParseElement.java:92)
        at org.elasticsearch.search.SearchService.parseSource(SearchService.java:561)
        ... 12 more
spinscale commented 11 years ago

Hey,

can you also show us the mapping of your index and a sample document you indexed to trigger this, so we get a complete picture of this problem? Does this also happen with 0.90.2?

Thanks!

spinscale commented 11 years ago

I took a closer look, this was my test and it worked

curl -X DELETE localhost:9200/foo
curl -X PUT localhost:9200/foo

curl -X PUT localhost:9200/foo/bar/_mapping -d '{
    "bar" : { "properties" : { "value" : { "type":"float" } } }
}'

curl -X PUT localhost:9200/foo/bar/1 -d '{ "value" : "2e04" }'
curl -X PUT localhost:9200/foo/bar/2 -d '{ "value" : "2e05" }'
curl -X PUT localhost:9200/foo/bar/3 -d '{ "value" : "2e06" }'
curl -X PUT localhost:9200/foo/bar/4 -d '{ "value" : "2e07" }'

curl -X GET localhost:9200/foo/_refresh
curl -X GET localhost:9200/foo/bar/_mapping

curl -X POST localhost:9200/foo/bar/_search -d '{ "query" : { "match_all" : {} }, "sort" : [  { "value" : { "order":"desc"} } ]}'

curl -X POST localhost:9200/foo/bar/_search -d '{ "query" : { "match_all" : {} }, "sort" : [  { "value" : { "order":"asc"} } ]}'

curl -X POST localhost:9200/foo/bar/_search -d '{ "query" : { "match_all" : {} }, "sort" : [  "value" ]}'

However checking your exception above, it actually tells, that you are trying to some faceting stuff. Can you provide the facet query as well in order to reproduce your issue? The mapping for the value_field seems to describe it as a string, but I need more information in order to be sure.

shaoweite commented 11 years ago

Thanks for the quick response. Here is the mapping:

{ "dev_bg_20130729232025_a" : { "combinedlog" : { "properties" : { "app_id" : { "type" : "string" }, "app_name" : { "type" : "string", "index" : "not_analyzed", "omit_norms" : true, "index_options" : "docs" }, "company_id" : { "type" : "long" }, "company_name" : { "type" : "string" }, "device" : { "type" : "string" }, "doctype" : { "type" : "string" }, "email_cc" : { "type" : "string" }, "email_date" : { "type" : "string" }, "email_from" : { "type" : "string" }, "email_subject" : { "type" : "string" }, "email_to" : { "type" : "string" }, "file_name" : { "type" : "string", "index" : "not_analyzed", "omit_norms" : true, "index_options" : "docs" }, "ip_src" : { "type" : "string", "index" : "not_analyzed", "omit_norms" : true, "index_options" : "docs" }, "keyphrases" : { "type" : "string", "index" : "not_analyzed", "omit_norms" : true, "index_options" : "docs" }, "loc_acc" : { "type" : "string" }, "loc_city" : { "type" : "string" }, "loc_country" : { "type" : "string" }, "loc_lat" : { "type" : "float" }, "loc_lon" : { "type" : "float" }, "loc_region" : { "type" : "string" }, "log_type" : { "type" : "string", "index" : "not_analyzed", "omit_norms" : true, "index_options" : "docs" }, "mime" : { "type" : "string", "index" : "not_analyzed", "omit_norms" : true, "index_options" : "docs" }, "page_title" : { "type" : "string", "index" : "not_analyzed", "omit_norms" : true, "index_options" : "docs" }, "referer" : { "type" : "string" }, "req_channel" : { "type" : "string" }, "req_id" : { "type" : "string", "index" : "not_analyzed", "omit_norms" : true, "index_options" : "docs" }, "req_id2" : { "type" : "string" }, "req_ts" : { "type" : "string" }, "request" : { "type" : "string", "index" : "not_analyzed", "omit_norms" : true, "index_options" : "docs" }, "request_ts" : { "type" : "string", "index" : "not_analyzed", "omit_norms" : true, "index_options" : "docs" }, "response_bytes" : { "type" : "string" }, "response_time" : { "type" : "string" }, "ts" : { "type" : "long" }, "url" : { "type" : "string", "index" : "not_analyzed", "omit_norms" : true, "index_options" : "docs" }, "user_agent" : { "type" : "string", "index" : "not_analyzed", "omit_norms" : true, "index_options" : "docs" }, "user_email" : { "type" : "string", "index" : "not_analyzed", "omit_norms" : true, "index_options" : "docs" }, "user_id" : { "type" : "string" }, "user_name" : { "type" : "string", "index" : "not_analyzed", "omit_norms" : true, "index_options" : "docs" }, "xact" : { "type" : "string" } } } } }

The field that I had such issues is 'loc_lon'. Here is two sampe docs with one having float value of 'loc_lon' but the other string:

{ "took" : 4, "timed_out" : false, "_shards" : { "total" : 10, "successful" : 10, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : null, "hits" : [ { "_index" : "dev_bg_20130730173313_a", "_type" : "combinedlog", "_id" : "UfGYbH8AAQEAAEhHB7IAAADE", "_score" : null, "_source" : {"loc_city": "beijing", "app_name": "Google Apps", "referer": "", "file_name": "", "loc_region": "beijing", "app_id": "1", "company_id": "", "xact": "UfGYbH8AAQEAAEhHB7IAAADE", "page_title": "", "keywords": "", "email_date": "2013-07-25T21%3a28%3a10.093Z", "loc_country": "cn", "file_ext": "", "ip_src": "10.1.0.157", "user_id": "", "response_bytes": "3151", "ts": "20130725212812", "company_name": "", "email_cc": "qa@acme.com qa@acme.com", "user_name": "", "loc_lon": 116.39631301781112, "action": "", "loc_acc": "", "email_to": "John Willson jwilson@acme.com", "device": "", "log_type": "access", "response_time": "284057", "url": "https%3a//m.google.com/Microsoft-Server-ActiveSync?User=qa@acme.com&DeviceId=ApplC8QF8GP4DDP9&DeviceType=iPhone&Cmd=Sync", "request": "POST /Microsoft-Server-ActiveSync/?User=qa@acme.com&DeviceId=ApplC8QF8GP4DDP9&DeviceType=iPhone&Cmd=Sync HTTP/1.1", "loc_lat": 39.90631301781111, "user_agent": "Apple-iPhone3C3/1002.329", "email_subject": "Test email \u6e2c\u8a66\u90f5\u4ef6", "mime": "", "email_from": "QAqa@acme.com", "user_email": ""}, "sort" : [ 20130725212812 ] }, { "_index" : "dev_bg_20130730173313_a", "_type" : "combinedlog", "_id" : "UdMTYH8AAAEAAFeKGfoAAACG00", "_score" : null, "_source" : {"loc_city":"new_york","req_id2":"0","req_channel":"0","file_name":"Hello World.docx","loc_region":"new_york","ts":"20130702175232","loc_lon":"1235","company_id":1,"loc_lat":"1235","company_name":"acme.com","req_id":"UdMTYH8AAAEAAFeKGfoAAACG","doctype":"docx","user_email":"qa@acme.com","user_id":"166","loc_country":"us","keyphrases":"","prev_tags":"","log_type":"beacon","ip_src":"10.1.0.132","user_name":"QA Test"}, "sort" : [ 20130702175232 ] } ]

The facet query is:

[2013-07-29 23:19:22,390][DEBUG][action.search.type ] [Eshu] [dev][4], node[kt-02Pi7Q7ifvVj_ynxdXQ], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@6bc71ff5] org.elasticsearch.search.SearchParseException: [dev_bg_a][4]: from[-1],size[-1],sort[<custom:"ts": org.elastics earch.index.fielddata.fieldcomparator.BytesRefFieldComparatorSource@4fd2667e>!]: Parse Failure [Failed to parse source [{"sort": [{"ts": {"order": "desc"}}], "filter": {"and": [{"range": {"ts": [{"to": "20130729231921", "f rom": "20130722231921"}]}}, {"term": {"company_id": 1}}]}, "facets": {"aggregate": {"facet_filter": {"and": [{" range": {"ts": [{"to": "20130729231921", "from": "20130722231921"}]}}, {"term": {"company_id": 1}}]}, "terms": {"field": "loc_country", "size": 30}}, "latitude": {"facet_filter": {"and": [{"range": {"ts": [{"to": "20130729 231921", "from": "20130722231921"}]}}, {"term": {"company_id": 1}}]}, "terms_stats": {"value_field": "loc_lat", "key_field": "loc_country", "size": 30}}, "longitude": {"facet_filter": {"and": [{"range": {"ts": [{"to": "201 30729231921", "from": "20130722231921"}]}}, {"term": {"company_id": 1}}]}, "terms_stats": {"valuefield": "loc lon", "key_field": "loc_country", "size": 30}}}, "size": 100}]]

Thanks for helping.

Regards, Wei

shaoweite commented 11 years ago

Forgot to reply wrt 0.90.2. Actually even on 0.90.0 I could no longer reproduce the exception from the facet query. The query now works without such exception even though data are still mixed with string and float values. I will keep investigating and provide more feedbacks if possible.

Thanks, Wei

spinscale commented 11 years ago

I just saw this commit https://github.com/elasticsearch/elasticsearch/commit/31fd7764e782384e3a278815dbd2a7c3cf065ed5

Judging from that I suppose that your terms stats query might have gone over several indices, and in one of the indices the field for stats was not not yet mapped.

Maybe this helps that you are not able to reproduce, as all your indices now contain data.

spinscale commented 10 years ago

closing for now due to lack of new infos. feel free to reopen, if you stumble over this again.