Fields in the buda-generated MARC records needed for DRS

TBRC-TimB commented 2 years ago

We have a few asks from Harvard for our marc records to help smooth out the ingestion process.

Currently we have this for the 001 field: <marc:controlfield tag="001">(BDRC)bdr:W00EGS1016181</marc:controlfield>

They are suggesting we break it out into two fields. the 001 for just the unique ID for the work (e.g.: W00EGS1016181) and the 003 field for the organizational identifier, which they have as MaCbBDRC

They also asked us to get rid of our 035 field. In most cases, a system will be able to build the contents of this field from the 001 and 003, or 010.

eroux commented 2 years ago

Thanks for the report! The MARC records as we have now (on BUDA) are the result of a veeeeery long back and forth with Columbia, and that was such a tedious endeavor that I'm a bit hesitant to dive into it again...

a few remarks:

in the new database, the ID is really bdr:W00EGS1016181, not just W00EGS1016181 so if possible I'd rather keep the bdr:
Columbia required that we had a 035 field but since we likely won't send them more records we can remove it
Columbia did rewrite the 001 field when they ingested the records i their own database, perhaps Harvard can do the same? (I don't know how this is usually done)
adding the 003 proposed by Harvard will be cool! I suppose the Cb in MaCbBDRC means Cambridge which isn't really the case anymore but that's not a big deal (it's actually a good case for non-semantic IDs!!)

wdyt?

TBRC-TimB commented 2 years ago

Oh i can imagine. The big issue with marc records is they are just standard enough that everyone thinks their way is the way everyone should do it .

the 001 field seems to be something they were really pushing. Like you mentioned, post processing on their end might be possible . After all , they used to ingest our marc records the way they were. But I imagine because the drs process is so automated, reworking the fields might not be something they can easily do when batch ingesting into HOLLIS.
I'm also a fan of the 003. I'm willing to bet that the MaCbBDRC might be a standard Harvard is already using as a unique ID for bdrc. While not technically accurate, I think the semantic part of it is really meant to be arbitrary. But I completely agree, semantic unique IDs are a bad time.

DRS communications have been pretty dried up since the Summer. Once we have some of these changes rolled out we can get back in contact and hopefully resume the whole process.

eroux commented 2 years ago

@TBRC-TimB would https://purl.bdrc.io/resource/W00EGS1016181.mrcx be satisfactory?

TBRC-TimB commented 2 years ago

I can reopen communications with DRS and get their feedback. the only potential hiccup i see is the 001 field. I'll make the case that this is the way we have our unique item ids in our own system. Thanks for looking into this. I'll report back once I hear from harvard.

TBRC-TimB commented 2 years ago

I got some feedback on these marc records from harvard. Seems they want us to try and change a few other fields too.

This looks good. Two things:

The ISBN of the print original should not go in 020 $z. It should go into 776 like so: 776/0_ $cOriginal$z7540932317 Can you remove the extraneous space in the pagination? E.g. change "1 online resource (3, 347 pages)" to "1 online resource (3,347 pages)" Thanks!

@eroux how possible is it to make these changes easily on our end? I imagine the second one might be tricky since it would involve correcting the actual content of the fields rather than just how they are presented in the marc record.

eroux commented 2 years ago

oh the second one looks like a mistake, thanks! interesting about the first change, I can make it yes, makes sense

eroux commented 2 years ago

I implemented these two changes, but unfortunately the issue with the extent statement is in the data: the book has 347 pages, but the extent statement says "3, 347 p.", I have no idea what the first "3" here refers to... but that's something else, not an issue with the Marc export

TBRC-TimB commented 2 years ago

wow, I figured it was a typo but didnt think it would be off my a magnitude of x10. So it sounds like a data input error. Is there a way to bulk query for that field to get a guess at how common that sort of extent error is? I'm hoping it was a typo and not an attempt to convey a different kind of information, ie 3 chapters 347 pages. If its just a one off error we can probably ignore it. Thanks again for looking into it.

TBRC-TimB commented 2 years ago

oh just took a look at this. It looks like you replaced the ISBN from the 020 with a 760, not a 776 field. The content of the element looks good. Let me know when it is all squared and I'll send it off again for approval.

eroux commented 2 years ago

right, sorry for that! fixed

TBRC-TimB commented 2 years ago

Perfect! thanks again

eroux commented 2 years ago

After further discussions, we should:

VIAF

add the VIAF URI of persons when we have them. When we do they should go in the $1 field, which should be 100, for instance

100 1# $a Obama, Michelle, $d 1964- $e author. $1 http://viaf.org/viaf/81404344

see relevant documentation in the PCC Formulating URIs guide and the PCC Linked Data Best Practices report.

OCLC number

for the erecords only, they are provided by IA on URLs like

https://archive.org/metadata/bdrc-W3CN4988/metadata/external-identifier

and in order to get all the records one can search like this or use the advanced search or the ia search command line

The fields should probably go to 035_$a like in this example from IA

856 improvements

The links to BDRC on Worldcat (example) could look better. Having a proper $y, $3 and $7 would improve it

eroux commented 2 years ago

About VIAF, people at Harvard will make a request to OCLC to have a $1 subfield in 720, so that we can place the VIAF ID there. It will take about a year.

Let's add the OCLC number to the database and to the MARC records in that field.

For 856, the Harvard team thinks we should use 856 40 $3 Buddhist Digital Resource Center: $u http://purl.bdrc.io/resource/W1KG16654

I propose we also add a $7 when it's full access.

TBRC-TimB commented 2 years ago

if we are going to make some of these changed to our marc records generally, should I hold off on building the marc records for the google books process?

eroux commented 2 years ago

oh, very good question! I don't think we need to wait for the 720$1 field to be accepted by OCLC, but I'll make the other changes this week so that you can do the export, I'll keep you updated

eroux commented 2 years ago

@TBRC-TimB I've updated the MARC export (without the VIAF URLs), I think it's ready for the export to Google Books, tell me if you encounter issues

eroux commented 2 years ago

Harvard wants the 856 descriptive part to be in $y instead of $3

eroux commented 2 years ago

two other comments from Harvard:

[x] In the records that have a 776, there should be a colon at the end of the $i information. For example in W1KG14512 you have =776 08$iElectronic reproduction of (manifestation)$w(DLC) 2010309067 You should have: =776 08$iElectronic reproduction of (manifestation):$w(DLC) 2010309067
[x] W1PD159430 has an error in the 490 where it has "=490 0\$v1-7" Since the 300 field states that this online resource is complete in 7 volumes, that information does not need to go into the series (490) field. This record should have no 490. This is connected to https://github.com/buda-base/library-issues/issues/424

jimk-bdrc commented 2 years ago

Currently we have this for the 001 field: <marc:controlfield tag="001">(BDRC)bdr:W00EGS1016181</marc:controlfield>

They are suggesting we break it out into two fields. the 001 for just the unique ID for the work (e.g.: W00EGS1016181)

and @eroux replied

in the new database, the ID is really bdr:W00EGS1016181, not just W00EGS1016181 so if possible I'd rather keep the bdr:

It's not really a suggestion to drop the 'bdr:' namespace designator, it will make their current DRS holdings consistent with future ones that we deposit.DRS has rigid file naming conventions, I don't know if having this prefix in their database will require that all the works and files we package for them have this prefix, or if it will invalidate DRS searches of our works.

eroux commented 2 years ago

There's been many discussions with Harvard about the MARC records, I expect they'll continue in September, there are some more things we need to change

buda-base / lds-pdi