JanusGraph / janusgraph

JanusGraph: an open-source, distributed graph database
https://janusgraph.org
Other
5.28k stars 1.16k forks source link

textcontains with label constrict doesn't work #2190

Open ChenZhaobin opened 4 years ago

ChenZhaobin commented 4 years ago

when I use g.V().has('name', textContains('a')) or g.V().hasLabel('company'),they all works well ,but if I combine these two conditions, I always get no results. for example g.V().hasLabel('company').has('name', textContains('a')) . although all my vetex labels are company, I got zero count in result. Could someone helps me out please ?Since it's really very important for me

FlorianHockmann commented 4 years ago

Are you using partitioned labels? In that case this looks like a duplicate of #1842 where you also already commented.

ChenZhaobin commented 4 years ago

Actually ,I am using the configuration docker-compose-cql-es.yml same with here: https://github.com/JanusGraph/janusgraph-docker

FlorianHockmann commented 4 years ago

My question was whether you have created the vertex label company like this:

mgmt.makeVertexLabel('company').partition().make()

as #1842 mentions a problem with vertex label constraints if the label is partitioned.

Apart from that, is the problem specific to text predicates or does it also occur if you use a simple has step? So something like this:

g.V().hasLabel('company', 'name', 'test-company')
ChenZhaobin commented 4 years ago

@FlorianHockmann I have created the vertex label company using gremlin.net like this: var company= g.AddV("company").Property("name", 'IBM').Property("code", "010101").Next() and it is normal when I combine hasLabel and has filter, and it is also normal if I use containsTextPrefix('I') ,but when comes to textContains,it just doesn't work. textContains('I') doesn't work, textContains('IB') doesn't work, unless textContains('IBM') can find one result with name IBM,but actually there are companies naming 'IBMan','IBManufacture' in my database.

rngcntr commented 4 years ago

Hi @ChenZhaobin That's intended behavior. Have a look at the docs. It says:

textContains: is true if (at least) one word inside the text string matches the query string

So keep in mind textContains matches full words, not arbitrary substrings. That's why 'IBM' is found but 'IBMan' is not found. If you had an entry like 'IBM Manufacture', it would be found.

ChenZhaobin commented 4 years ago

@rngcntr nope,maybe above is not a good example, actually my field is composed of multiple chinese characters,whose every word can be analysed to a string using ik analyzer,which is used as a plugin in elasticsearch. it is normal when using g.V().has('name', textContains('one or more chinese character')) ; but the list result is null when using g.V().hasLabel('company').has('name', textContains('one or more chinese character')) which combines indexed field and the label filter

ChenZhaobin commented 4 years ago

@rngcntr @FlorianHockmann finally,I solved this issue by below query: g.V().hasLabel('company').filter{it.get().property('name').value().contains('one or more chinese character')}

rngcntr commented 4 years ago

Nice to see your solution @ChenZhaobin! But I think the issue should stay open because the use of hasLabel should not impact the functionality of textContains.

ChenZhaobin commented 4 years ago

@rngcntr reopened it, guess this is an issue related with mix index using es and other than default analyzer

ChenZhaobin commented 3 years ago

this issue has nothing to do with custom analyzer, it is same as below tickets: https://github.com/JanusGraph/janusgraph/issues/1788 https://github.com/JanusGraph/janusgraph/issues/1379

@FlorianHockmann @porunov @pluradj do we have solution or plan for this?

Zonkodonko commented 1 year ago

The problem ist that textContains does not (as the name implies) searches for a substring, instead it searches for a word! What does that mean? Well the value gets tokenized and then the value will be searched for the searchterm with space in front and after it. Completly bad documented. And the worst: There is no alternative to search for a substring

FlorianHockmann commented 1 year ago

@Zonkodonko: Yes, textContains searches for words. That was however already described above.

How is that poorly documented? The docs state first that:

Text search predicates which match against the individual words inside a text string after it has been tokenized

and then for textContains:

is true if (at least) one word inside the text string matches the query string

(emphasize added)

If you think that the docs could be improved on this, then please open a new issue.

This issue is about the problem that:

the use of hasLabel should not impact the functionality of textContains.

which is also described in the issue description itself.

Zonkodonko commented 1 year ago

@FlorianHockmann You are right. Somehow I didn't catch this while reading the documentation. It's just that the wording of the methode and of the documentation is really confusing. Especially if you read the gremlin documentation before and think you know how it is supposed to work. Sorry for that.