gbv / coli-ana

API to analyze DDC numbers
https://coli-conc.gbv.de/coli-ana/app/
MIT License
2 stars 0 forks source link

Explore and document how to use coli-ana for retrieval #24

Open nichtich opened 3 years ago

nichtich commented 3 years ago

Explore and document methods to use decomposed DDC numbers for indexing and/or query expansion in retrieval systems such as Solr/Elasticsearch/...

To give an example, a document with DDC number 700.90440747471 should be indexed with "Modern arts", "1940-1949", "Museums, collections, exhibits", and "New York Metropolitan Area" (plus synonyms for each of these classes) and (probably ranked lower) with all labels of classes in the hierarchy.

The use case could be split into two steps:

  1. Analyze DDC numbers and split them into their main components
  2. Expand index and/or query for each component based on class labels and hierarchy

Only the first is task of coli-ana but the use case should be documented as part of coli-ana still.

nichtich commented 2 years ago

Some notations with titles in K10plus (found by iterating related notations)

It should be possible to jump from 641.509 - historic cooking to

This requires a database with the full DDC hierarchy, all decompositions and the number of titles for each class.