Use LOV statistics for ranking

There are three parts to this:

How to get the numbers from the LOV dump
How to get them stored in the index
How to configure the index or queries to make use of the information

This comment focuses on the second point.

How to include LOV numbers in the indexed documents

The key is the LOVWrapper class. This is a wrapper around a VocabularyTermExtractor. The VocabularyTermExtractor iterates over class/property descriptions extracted from an RDFS/OWL Model. Now the LOVWrapper modifies these descriptions with LOV-specific stuff. For example, it adds a “vocabulary” field to the JSON with information about the vocabulary that defines the term. Here you could also add scoring information. The best way to do that is probably:

Add a new Describer (similar to TermDescriber) that adds scores for a given class/property. Perhaps call it TermLOVScoreDescriber or somesuch.
Instantiate that Describer in the LOVWrapper constructor, and invoke it in modifyDocument()
To instantiate the Describer, you will need to pass the SPARQLRunner from LOVExtractor to LOVWrapper so that the Describer has access to the full LOV dataset including the scoring information in named graphs.

cygri / vocidex

Use LOV statistics for ranking #1

How to include LOV numbers in the indexed documents