AskNowQA / AutoSPARQL

Warning: Not working at the moment. Maintainer on parental leave. AutoSPARQL allows to create SPARQL queries over RDF knowledge bases from natural language with low effort.
http://aksw.org/Projects/AutoSPARQL.html
GNU General Public License v3.0
91 stars 54 forks source link

duplicated results with different descriptions #3

Open diadem opened 12 years ago

diadem commented 12 years ago

the following query "houses in Summertown" retrieves several times the two properties:

Water Eaton Road, Summertown OX2 £399,950.00 Street: Water Eaton Road, Summertown OX2

bedrooms: 2

bathrooms: 1

Divinity Road, Cowley OX4 £399,950.00 Street: Water Eaton Road, Summertown OX2

bedrooms: 2

bathrooms: 1

It can be a problem directly in the extracted data, or in the visualization

LorenzBuehmann commented 12 years ago

This is a problem in the extracted data where the same URI is used for different entries. This results in several solutions when using SPARQL which appear to be duplicates in the UI, but have for instance different descriptions or images.

timfu commented 12 years ago

Ok, great. That means if we fix the issue with same URIs that should go away?

I am not 100% convinced, e.g., for right now queries such as "houses in headington" say "using fallback" and then return

Horton Hill, Horton Cum Studley, OX33 The proposed development comprises the construction of a 3-storey extension to the rear of the hotel to accommodate an additional 20 bedrooms and ancillary accommodation, 4 detached houses and garages and a shop to the front of the hotel. Planning Statement: Although the houses/hotel extension can now be built in phases, a condition attached to the Planning Permission for the houses requires that the hotel extension shall be built concurrently with the houses and that the houses may not be occupied until the hotel extension is complete and rea... £1,600,000.00

x 7

then

Land For SalePortland Road, Milcombe, Banbury, OX15 Situated in Portland Road, Milcombe is this residential Building Land with permission for 5 houses situated in quiet village location adjoining open...

x 6

And so on and so forth. That seems more than the possible URI overlap.

LorenzBuehmann commented 12 years ago

Yes, there are also duplicates in the Lucene index which is used as fallback. Have to check why this happens.

LorenzBuehmann commented 12 years ago

Ok, the duplicates in the fallback Lucene index occur because of the duplicates in the extracted data. I avoid this now by only indexing 1 document per distinct URI, but this indeed lowers the recall.