books: book series and volumes

equadon commented 5 years ago

Check if these values are correct for book series and volumes:

246__n: Number of volumes 246__p: Title of the volume

490__a: Book series title 490__v: Book series volume (is it related to 246__ somehow?)

agentilb commented 5 years ago

Those 2 fields correspond to 2 different information:

490 refers to the book series to which the book belong (title of the book series and volume) -> the book series is here a periodical, in some cases, we even have an ISSN in 490x ex: https://cds.cern.ch/record/2143603?ln=en
246 n and p is used when there is a book with different volumes, in this case, the main title is in 245 and the 246 fields indicate the volume titles. ex https://cds.cern.ch/record/1636730?ln=en

equadon commented 5 years ago

@agentilb Thank you for the information!

equadon commented 5 years ago

@agentilb We are developing book series and volumes now and I have a question about what to call records that belong to a series but also has a collection of volumes inside. For example, https://cds.cern.ch/record/1517909 belongs to a book series (in this case Springer tracts in modern physics) but it's also a collection of volumes. We want each volume of a series to be a document similar to: series-volumes

Does it makes sense to call a collection of volumes (like Dispersion Forces in https://cds.cern.ch/record/1517909) a series or do you know of a better name to describe it?

agentilb commented 5 years ago

@equadon

We really have to distinguish: 1) book series (i.e. Springer Tracts in Modern Physics), which is a real series, with an ISSN 2) volumes inside a book (i.e. Dispersion forces which has 2 volumes)

Actually, the distinction of main record / volume we want to make in the new system is for 2) and not really for 1)

For 1), we do not want to change how it is done currently, since in most cases, we don’t have a specific record for the book series itself, and if we have, it is a series = a periodical record = different data model (we have only few cases in our collection). For volumes included in Book series, it is enough to leave them as they are and keep the information of appartenance of book series in 490 which will become book_series in the new data model.

The need is therefore for 2) as described in: https://github.com/CERNDocumentServer/cds-migrator-kit/issues/16 (Case 1 and 2). Where we need a record for the complete document (Dispersion forces) and a description for each volumes (vol. 1 and vol. 2) which may have different: title, ISBNs, authors, abstract, DOI, book series, publication year, more rarely publisher...

To summarise, from you example, we need only the second level. We indeed have to think regarding the naming, but for sure I wouldn’t use "book series", given the explanation above.

Hope this clarifies the request. Let me know if you need further explanation.

Maybe, it would be easier to discuss this directly? We could quickly meet this afternoon or Thursday before 11am?

agentilb commented 5 years ago

Hi @equadon

Here is an example on how volumes are treated in Nebis (another Library catalogue):

Main volume: https://recherche.nebis.ch/primo-explore/fulldisplay?docid=ebi01_prod000176943&context=L&vid=NEBIS&search_scope=default_scope&tab=default_tab&lang=en_US

Click here to show all volumes : https://recherche.nebis.ch/primo-explore/search?query=lsr01,contains,000176943&search_scope=default_scope&sortby=rank&vid=NEBIS&lang=de_DE

equadon commented 5 years ago

Hi @agentilb

Thank you for the links! Those examples were very good for us.

After some discussion and research from our team we have a proposal for what we believe would be the best way to model volumes and series. Based on information we found, in particular from http://www.rdaregistry.info/termList/ModeIssue/, we noticed it's possible to categorize series based on the mode of issuance. For example, a series with a mode of issuance=serial would be used for what is called a series in CDS now while a multi-volume would be modelled with mode of issuance=multipart monograph.

The plan is to model both types of series with one record called Series which would have additional metadata e.g. "mode of issuance", title, ISSN (only for mode of issuance=serial), and more. This way it will be possible to manage both types of series in the backoffice and attach metadata if needed in the future. A document will have a reference to the series (serial/multipart monographs) it belongs to.

We have some additional questions we would like to discuss:

Do you see any issues with this data model based on the data we have now in CDS?
Could this work based on how the CERN library categorizes books?
Can a multi-volume belong to another multi-volume?

agentilb commented 5 years ago

I think indeed using this "mode of insurance" is a good idea. However, we have to take into account those aspects:

1) Metadata difference:

For real book series, we will only have the title in most of the cases, the ISSN in some cases (as per the current metadata).
For multi-volumes monographs, we can have much more information: Title, ISBN, Imprint, Abstract, Edition, Authors... -> for those, the data model should be identical to the one one used for a normal "single" monograph.

2) A volume can be at the same time part of a book series and a multi-volumes monograph, and the distinction between both information should be made clear.

3) The search behaviour from users is a bit different for series volumes (often searched as the volume level) and multi-volumes monographs (often searched at the "main record" level), but this shouldn't be problematic I guess, since it depends mot on how the search engine is built.

Do you really need/want to have separate records for the book series? We could also imagine to, as discussed, leave the books series as they are, and have the "mode of insuance" for all book records, with "single unit" when it is a single volume and "multipart monograph" when it has several volumes. And it case this is "multipart monograph", this record is linked with the corresponding volumes (and vice-versa), which would be separate book records with "mode of insuance = single unit" .

It can probably happen that multi-volumes are part of multi-volumes, but the case is so rare that we said it is not necessary to foresee this scenario.

Question on my side:

How do you foresee in practice the linking between "multipart monograph" and volumes?

kpsherva commented 5 years ago

Hi @agentilb ! Thank you for updating the requirements on this one, we will take them into account. As it comes for separating the series as a record - it will help to keep data consistency and linking between records. Imagine that there are 3 books in the same series and then we want to add another one, but there will be a typo in the name of the series, and as a result in the system they won't appear as linked, because from the system's point of view it's not the same series. Also in case you add a new book to a series, you don't have to update all the other ones to link them, you just link it with a series record and they are automatically related. We cannot mark most of the books as single unit, because it created data redundancy as most of the books will be single units, adding the series fields only to the ones that are different will save us database space. As it comes for your question - we will solve the linking problem in the coming sprints and we will keep you up to date.

agentilb commented 5 years ago

Hi @kprzerwa, Thanks for the answer. Re- creating new records for book series, your argument is fully valid but could actually apply to many other fields where we won't create separate records (publisher, author...), and as I said already, except some rare cases, where users know about the book series and could be interested to browse among the volumes, in the majority of cases users won't look for the volumes via the book series. In addition, we currently have some 9000 uniq book series titles in our records, so it would mean creating 9000 new (almost empty) records in the database that we will need to clean before the migration, to avoid having duplicated entries with errors. Moreover, this new structure will have to be implemented in the harvester of ebooks and Amazon records. (i.e. creating book series records in addition to the book record). I think all this should be taken into account before we take the final decision.

CERNDocumentServer / cds-dojson

books: book series and volumes #210