gbv / coli-ana

API to analyze DDC numbers
https://coli-conc.gbv.de/coli-ana/app/
MIT License
2 stars 0 forks source link

Fix DDC issues #8

Closed stefandesu closed 3 years ago

stefandesu commented 3 years ago

There are some issues with our version of DDC German (in https://coli-conc.gbv.de/api/) and our tools that need to be addressed before coli-ana can work properly:

nichtich commented 3 years ago

I created an issue at jskos-tools (https://github.com/gbv/jskos-tools/issues/28) but think the best workaround to notations starting with T is to remove the T on import, so T1--0 becomes 1--0. Number spans in tables also need to be modified, e.g. T1--0901-T1--0905:07 => 1--0901-0905. In summary (Perl syntax):

s/:\d+$//; # remove colon suffix
if ($_ =~ /^T(\d[ABC]?)--/) {
  $table = $1;
  $_ =~ s/T\d[ABC]?--//g; # remove all T...
  $_ = $table . "--" . $_; # add back removed table number
}

This fits to existing notation pattern used in existing DDC data in RDF.

nichtich commented 3 years ago

To clarify classes with colon suffixes T1--0901-T1--0905:07 is actually T1-0901-0905 with multiple notations:

stefandesu commented 3 years ago

To clarify classes with colon suffixes T1--0901-T1--0905:07 is actually T1-0901-0905 with multiple notations:

  • 1--0901-0905 (internally/fallback)
  • T1--0901-0905 (display, retrieved from backend database with full DDC data)
  • --0.901--------, --0.902--------... (coli-ana, depending on example)

Could you clarify this further? I still don't understand this part. Referring to WebDewey Deutsch, they have different labels:

In the decomposition of 700.90440747471, both T1--0901-T1--0905 and T1--0901-T1--0905:07 are listed. The former, however, has for some reason the suffix :0904 which confuses me even further (it seems to refer to the next line T1--0904).

stefandesu commented 3 years ago

In summary (Perl syntax):

Apart from removing the colon suffix (see my previous comment), this is looking good. I could try to adjust this to JavaScript and add it to coli-ana first (and if it works as expected, we could add it to jskos-tools).

nichtich commented 3 years ago

The workaround is implemented but

stefandesu commented 3 years ago
  • we'd better directly add the uri in convert.js instead of using uriFromNotation, so we only have two notations (one to display normally, one for the decomposition table) and look up classes via their URI.

Yeah, I didn't add URIs there in order to save space in the database, but we can add the URIs during conversion.

nichtich commented 3 years ago

Closed in favor of more specific #13.