gbv / coli-ana

API to analyze DDC numbers
https://coli-conc.gbv.de/coli-ana/app/
MIT License
4 stars 0 forks source link

Include information about DDC rules that led to a composed number #22

Open nichtich opened 3 years ago

nichtich commented 3 years ago

Requires

nichtich commented 3 years ago

Each rule has a short text (derived from MARC). There should be more details (name, description, examples...) but we can start with this short string for each rule:

{
  "r1": { "short": "Unless it is redundant, add to base number<|[Aa]dd to base number" },
  "r2": { "short": "the numbers following" },
  ...
}

The list of rules can directly be served as static JSON file e.g. at /rules.json. The "place to display all rules at the web interface" is secondary, maybe postpone it until we have more understandable information about each rule.

Encoding of rules in JSKOS: I'd not make rules part of standard JSKOS (unless we have experience with other faceted classifications that make use of their own rules). So each member of memberList gets an optional field RULE with an array if strings that each identify a rule, e.g. "RULE": ["p9"] or "RULE":["p20","p5"]. The prefix p is used because we already use it and because numerical identifiers have the disadvantages that you cannot add rules in between if needed (e.g. p20a). We can later switch to URIs as identifiers but that should be coodinated with OCLC/Pansoft.

stefandesu commented 3 years ago

Just to clarify:

Edit regarding the last point:

p1_2_3_17 ist ein Regelpattern (deshalb "p"): es folgen nacheinander die Regel(teile) r1 -> r2 -> r3-> r17

So "p" stands for pattern und "r" for rule. But I think if we split up the pattern, we can then use the r prefix.

nichtich commented 3 years ago

To handle additional parameters of rule application, such as p9->781.2-781.8, we need to use another format, e.g. there is this rule pattern:

"p9": { "short": "Add as instructed under" }

The full "rule" (sorry, we are using muddy terminology, this needs to be clarified) could for instance

Add as instructed under 781.2-781.8

So better keep prefix p and use a more complex format for key RULE, e.g.

"RULE":[{"pattern":"p20"},{"pattern":"p2"},{"pattern":"p7"}]
"RULE":[{"pattern":"p9","parameter": "781.2-781.8"}]

I am sure there are also rule patterns with multiple parameters. If we know which rules have parameters, better store rule short text like this:

"p9": { "short": "Add as instructed under %s" }
stefandesu commented 3 years ago

As far as I understand it, pattern p20_2_7 does not consist of patterns p20, p2, and p7 (none of those exist), but rather of rules r20, r2, and r7 applied in sequence. So technically we should use the r prefix which would also make it easier to match them to the rule definitions, as stated above.

Also as far as I can see, patterns can, but don't have to have parameters (and not sure if every pattern can have parameters). It seems like p9 can also be used without a parameter, so using substitutions like you are suggesting might not work as well.

nichtich commented 2 years ago

First information about rules is included in the rules branch. Each rule has a regular expression pattern to match subfield values in MARC21 classification 6XX fields and/or MARC21 classification 7XX fields).

An example: p20_2_7 (r20, r2, r7) is used in DDC class 789.57 Hybrid styles with MARC 761 field (excluding the examples to simplify the data):

761  0 $i Add to $b 789.57 $i the numbers following $r 781.6 $i in $d 781.62 $c 781.69

Or with reference to the list of rule elements (highly normalized):

r20, 789.57, r2, 718.6, r7, 781.62-781.69.

Or as textual building instruction:

Add to 789.57 the numbers following 781.6 in 781.62-781.69

The latter is most likely what to show in the user interface but we are not allowed to do so for license reasons (even not in German). I am not sure whether this form would be ok:

Add to ... the numbers following ... in ...

The textual instruction could be passed as plain string in JSKOS. We might also include the list of rule elements in RULE as well, e.g. with numbers for rule patterns and strings for notations:

{
  "RULE": {
    "elements": [ 20, "789.57", 2, "718.6", 7, "781.62-781.69" ],
    "pattern": "Add to ... the numbers following ... in ...",
    "instruction":  "Add to 789.57 the numbers following 781.6 in 781.62-781.69"
  }
}