Closed rlvoyer closed 9 years ago
I think you either need the source or the field needs to be stored or you need to store term vectors for the field. But I agree we should document that!
thanks for raising this... what is your mapping for those fields?
{
"document": {
"_source" : {
"enabled" : false
},
"term_vector": "yes",
"dynamic": false,
"properties": {
"_id": {
"type": "long",
"index": "not_analyzed"
},
"cs": {
"type": "string",
"analyzer": "keyword",
"store": "no"
},
"ks": {
"type": "string",
"analyzer": "keyword",
"store": "no"
},
"tpcs": {
"type": "string",
"analyzer": "keyword",
"store": "no"
}
}
}
}
ah I see you should put term_vector
next to store
for each filed you want to store term vectors. Can you try that?
like this:
{
"type" : "string",
"store" : "no",
"term_vector" : "yes"
}
simon
I pushed a fix to the documentation: https://github.com/elasticsearch/elasticsearch.github.com/commit/25614ced9513e24dc3ad99b976b00e8c384ff9f2
Thanks -- I'll make that fix. What is the effect (if any) of enabling term_vector storage at the top-level as I have done here?
hmm it seems that this only works if it's stored or you enabled source. we should be able to support this if TV are stored for the fields as well... reopening
Hey @s1monw -- have you had an opportunity to look into this issue?
I am not a fan of supporting it for tern vector and no store, cause then we need to get that info(TV) from the document on the specific shard and then send it to all the shards to do the MLT based on it. Just store the source and MLT based on that. You can also, btw, always use the MLT query as part of a search request and provide the text there externally.
@kimchy can you explain how storing the source alleviates the problem of distributing the term vector to all the shards for the MLT computation?
cause with the source text to do MLT by, you don't need the term vectors.
I agree this seems odd... isn't the TV just a different representation of a field?
@kimchy @s1monw so why store the term vectors at all? (I was only storing them because of the following doc: http://www.elasticsearch.org/guide/reference/api/more-like-this/) If MLT doesn't need them when it has the source text, does it then recompute term vectors given the source text?
I agree this should also work on TV though. yet at this point it doesn't so you might want to get rid of TV if you don't need them.
@kimchy @s1monw I'd like to try to write a plugin similar to more-like-this that does exactly what I want. Can you suggest any plugins that access term vectors that I might use as references? Any tips / documentation are much appreciated.
hey, we just added TermVector support lately. this issue is on our list to make use of the feature. Can you wait for it?
@s1monw Unfortunately, my company has a rapidly narrowing window for determining whether elasticsearch is right for the problem we're trying to solve. Given that the current built-in functionality doesn't seem to handle our use-case, a plugin seems like our only option in the short-term.
Excuse me but I'm currently trying to use the MLT feature. I read http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-more-like-this.html#search-more-like-this and either my english is completely bad of I have not the remotest idea what it is supposed to mean:
"Note: In order to use the mlt feature a mlt_field needs to be either be stored, store term_vector or source needs to be enabled."
What is "stored"? Which "source"? I've been searching the internet for two hours now and can't any example of how to use MLT successfully. And to be honest this issue report doesn't help me either. Could anyone shed some light on it and fix the documentation please?
In Elasticsearch you can either store the entire document (the json you send to ES when you index) aka. the source
or you can mark a field as stored : true
then we only store the value of that particular field. By default the source
is stored (or enabled
) but you can also disable
it via the mapping. The term_vectors don't work yet with MLT
hence this issue.
hope that helps
@s1monw Thanks for the reply. So to rephrase: any field I'm using as "mlt_fields=..." needs to
Okay. In my case the documents contain two fields. Example:
{ _index: "debshots", _type: "jdbc", _id: "396", _version: 35, exists: true, _source: { description: "Alarm Clock for GTK Environments", name: "alarm-clock" } }
But when I'm GETting http://localhost:9200/debshots/jdbc/396/_mlt Elasticsearch returns zero results:
{ took: 3, timed_out: false, _shards: { total: 1, successful: 1, failed: 0 }, hits: { total: 0, max_score: null, hits: [ ] } }
There are many other documents with a description like "Alarm curl plugin for uWSGI" so I had assumed that at least the "Alarm" is a term that makes it "more-like-that"-style.
I'd welcome a hint what is going wrong here. Thanks.
And I would also welcome a rewrite of that quoted phrase in the documentation because it's wrong english and hard to understand. (I still didn't.)
Can you take this please to the mailing list this is only for development issues.
thanks
@s1monw Will do. Please still consider rewriting this sentence in the documentation to make it understandable.
This issue is now outdated, closing.
I'm trying to use the more_like_this handler in almost the exact same way it's used in the documentation here:
http://www.elasticsearch.org/guide/reference/api/more-like-this/
curl -XGET "http://localhost:9200/foo/document/1008534/_mlt?mlt_fields=cs,ks,tpcs&min_doc_freq=2"
{"error":"ElasticSearchException[No fields found to fetch the 'likeText' from]","status":500}
I'm guessing this bug stems from the fact that source is disabled, but I'm not really sure. If it is the case that source is required for MLT, you should document that fact.