dice-group / LIMES

Link Discovery Framework for Metric Spaces.
https://limes.demos.dice-research.org/
GNU Affero General Public License v3.0
126 stars 54 forks source link

Please add a "less than" < string operator #256

Closed KonradHoeffner closed 2 years ago

KonradHoeffner commented 2 years ago

I could use it to remove duplicates when source and target are the same, see https://github.com/dice-group/LIMES/issues/255. What this efffectively does it remove the upper part and the diagonal from the comparison matrix, leaving only the (n*(n-1)/2) unique pairs.

Hypothetical Example

Matching people that have multiple citizenship and that are modelled separately for each country.

<SOURCE>
    <ID>c1</ID>
    <ENDPOINT>countries.ttl</ENDPOINT>
    <VAR>?person1</VAR>
    <PAGESIZE>-1</PAGESIZE>
    <RESTRICTION>?person1 a :Person</RESTRICTION>
    <PROPERTY>rdfs:label AS nolang->lowercase->regularalphabet RENAME label</PROPERTY>
    <PROPERTY>ex:country AS country</PROPERTY>
    <TYPE>TURTLE</TYPE>
</SOURCE>

<TARGET>
    <ID>c2</ID>
    <ENDPOINT>countries.ttl</ENDPOINT>
    <VAR>?person2</VAR>
    <PAGESIZE>-1</PAGESIZE>
    <RESTRICTION>?person2 a :Person</RESTRICTION>
    <PROPERTY>rdfs:label AS nolang->lowercase->regularalphabet RENAME label</PROPERTY>
    <PROPERTY>ex:country AS country</PROPERTY>
    <TYPE>TURTLE</TYPE>
</TARGET>

<METRIC>AND(TRIGRAMS(c1.label,c2.label)|0.8,LESS_THAN(c1.country,c2.country)|1)</METRIC>
abdullahfathi commented 2 years ago

See the branch https://github.com/dice-group/LIMES/tree/feature/greaterThanStrSimilarity