gbv / coli-ana

API to analyze DDC numbers
https://coli-conc.gbv.de/coli-ana/app/
MIT License
2 stars 0 forks source link

Replace notation-within-number-span with plain notation #41

Closed nichtich closed 3 years ago

nichtich commented 3 years ago

E.g. analysis of 700.90440747471 includes

T1--0901-0905:074 Museen, Sammlungen, Ausstellungen; Sammeln von Objekten

which should become T1--074 instead. I marked it as bug because notation T1--0901-0905:074 does not exist in the schedules but T1--074 does. The full notation with : could be of use for later analysis but not needed for now and for converting to PICA.

The change can directly be done in https://github.com/gbv/coli-ana/blob/dev/lib/parseInputStream.js.

stefandesu commented 3 years ago

notation T1--0901-0905:074 does not exist in the schedules

But it does, doesn't it? https://deweyde.pansoft.de/webdewey/index_11.html?recordId=int%3aT1--0901-0905%3b1%3b074

ulsw commented 3 years ago

stefan is right! Don't change!

nichtich commented 3 years ago

Well, it's an internal notation. Should we reduce to T1--074 instead only in the list of basic semantic elements as encoded in PICA?

nichtich commented 3 years ago

I've included the change in the conversion to PICA, so 700.90440747471 results in:

045H/10 $eDDC23ger$a700.90440747471$c7$f09044$f074$g7471$Acoli-ana

which consists of DDC notations

This will allow to find all publications about T1--074 in one search.

We can close this issue but should work on bringing tabular display and PICA result more aligned (see also #42).

stefandesu commented 3 years ago

Now I'm confused though. What's the difference between T1--074 and T1--0901-0905:074? Why can the same number be built with both of those?

Screen Shot 2021-07-23 at 10 17 30
ulsw commented 3 years ago

it depends on how the ddc numbers are built, in 709.04007479494 "T1--074" is applied, but in 759.030904107477595 "T1--0901-T1--0905:074" is applied

stefandesu commented 3 years ago

it depends on how the ddc numbers are built

Maybe I didn't make my confusion clear. Let's keep to 700.90440747471 as an example. In my previous comment, there's a screenshot from WebDewey where it is built with "T1--0901-T1--0905:074". However, you can also build the exact same number with "T1--074":

Screen Shot 2021-07-21 at 15 43 29

@nichtich also said "it's an internal notation." which is only more confusion. It seems like we're all talking past each other.

nichtich commented 3 years ago

However, you can also build the exact same number with "T1--074"

Either it is possible to build the same number in different ways or T1--0901-0905:074 can be mapped as semantically equivalent (although not structurally) to T1--074.

ulsw commented 3 years ago

Either it is possible to build the same number in different ways

There is only one way to synthesize DDC numbers, e.g. cf.

"https://www.oclc.org/content/dam/oclc/dewey/versions/print/intro.pdf "Classifying with the DDC 5.1 . . . TABLE OF LAST RESORT 5.9 When several numbers have been found for the work in hand, and each seems as good as the next, the following table of last resort (in order of preference) may be used as a guideline in the absence of any other rule:"

nichtich commented 3 years ago

I'm closing this issue with following solution:

UmaB7 commented 3 years ago

Jakob - As discussed agree with you :-)