Open TBRC-TimB opened 2 years ago
Thanks for the report! The MARC records as we have now (on BUDA) are the result of a veeeeery long back and forth with Columbia, and that was such a tedious endeavor that I'm a bit hesitant to dive into it again...
a few remarks:
bdr:W00EGS1016181
, not just W00EGS1016181
so if possible I'd rather keep the bdr:
035
field but since we likely won't send them more records we can remove it001
field when they ingested the records i their own database, perhaps Harvard can do the same? (I don't know how this is usually done)003
proposed by Harvard will be cool! I suppose the Cb
in MaCbBDRC
means Cambridge
which isn't really the case anymore but that's not a big deal (it's actually a good case for non-semantic IDs!!)wdyt?
Oh i can imagine. The big issue with marc records is they are just standard enough that everyone thinks their way is the way everyone should do it .
MaCbBDRC
might be a standard Harvard is already using as a unique ID for bdrc. While not technically accurate, I think the semantic part of it is really meant to be arbitrary. But I completely agree, semantic unique IDs are a bad time. DRS communications have been pretty dried up since the Summer. Once we have some of these changes rolled out we can get back in contact and hopefully resume the whole process.
@TBRC-TimB would https://purl.bdrc.io/resource/W00EGS1016181.mrcx be satisfactory?
I can reopen communications with DRS and get their feedback. the only potential hiccup i see is the 001 field. I'll make the case that this is the way we have our unique item ids in our own system. Thanks for looking into this. I'll report back once I hear from harvard.
I got some feedback on these marc records from harvard. Seems they want us to try and change a few other fields too.
This looks good. Two things:
The ISBN of the print original should not go in 020 $z. It should go into 776 like so: 776/0_ $cOriginal$z7540932317 Can you remove the extraneous space in the pagination? E.g. change "1 online resource (3, 347 pages)" to "1 online resource (3,347 pages)" Thanks!
@eroux how possible is it to make these changes easily on our end? I imagine the second one might be tricky since it would involve correcting the actual content of the fields rather than just how they are presented in the marc record.
oh the second one looks like a mistake, thanks! interesting about the first change, I can make it yes, makes sense
I implemented these two changes, but unfortunately the issue with the extent statement is in the data: the book has 347 pages, but the extent statement says "3, 347 p.", I have no idea what the first "3" here refers to... but that's something else, not an issue with the Marc export
wow, I figured it was a typo but didnt think it would be off my a magnitude of x10. So it sounds like a data input error. Is there a way to bulk query for that field to get a guess at how common that sort of extent error is? I'm hoping it was a typo and not an attempt to convey a different kind of information, ie 3 chapters 347 pages. If its just a one off error we can probably ignore it. Thanks again for looking into it.
oh just took a look at this. It looks like you replaced the ISBN from the 020 with a 760, not a 776 field. The content of the element looks good. Let me know when it is all squared and I'll send it off again for approval.
right, sorry for that! fixed
Perfect! thanks again
After further discussions, we should:
add the VIAF URI of persons when we have them. When we do they should go in the $1
field, which should be 100
, for instance
100 1# $a Obama, Michelle, $d 1964- $e author. $1 http://viaf.org/viaf/81404344
see relevant documentation in the PCC Formulating URIs guide and the PCC Linked Data Best Practices report.
for the erecords only, they are provided by IA on URLs like
https://archive.org/metadata/bdrc-W3CN4988/metadata/external-identifier
and in order to get all the records one can search like this or use the advanced search or the ia search
command line
The fields should probably go to 035_$a
like in this example from IA
The links to BDRC on Worldcat (example) could look better. Having a proper $y
, $3
and $7
would improve it
About VIAF, people at Harvard will make a request to OCLC to have a $1
subfield in 720
, so that we can place the VIAF ID there. It will take about a year.
Let's add the OCLC number to the database and to the MARC records in that field.
For 856, the Harvard team thinks we should use 856 40 $3 Buddhist Digital Resource Center: $u http://purl.bdrc.io/resource/W1KG16654
I propose we also add a $7
when it's full access.
if we are going to make some of these changed to our marc records generally, should I hold off on building the marc records for the google books process?
oh, very good question! I don't think we need to wait for the 720$1
field to be accepted by OCLC, but I'll make the other changes this week so that you can do the export, I'll keep you updated
@TBRC-TimB I've updated the MARC export (without the VIAF URLs), I think it's ready for the export to Google Books, tell me if you encounter issues
Harvard wants the 856 descriptive part to be in $y
instead of $3
two other comments from Harvard:
=776 08$iElectronic reproduction of (manifestation)$w(DLC) 2010309067
You should have: =776 08$iElectronic reproduction of (manifestation):$w(DLC) 2010309067
Currently we have this for the 001 field:
<marc:controlfield tag="001">(BDRC)bdr:W00EGS1016181</marc:controlfield>
They are suggesting we break it out into two fields. the 001 for just the unique ID for the work (e.g.:
W00EGS1016181
)
and @eroux replied
in the new database, the ID is really bdr:W00EGS1016181, not just W00EGS1016181 so if possible I'd rather keep the bdr:
It's not really a suggestion to drop the 'bdr:' namespace designator, it will make their current DRS holdings consistent with future ones that we deposit.DRS has rigid file naming conventions, I don't know if having this prefix in their database will require that all the works and files we package for them have this prefix, or if it will invalidate DRS searches of our works.
There's been many discussions with Harvard about the MARC records, I expect they'll continue in September, there are some more things we need to change
We have a few asks from Harvard for our marc records to help smooth out the ingestion process.
Currently we have this for the 001 field:
<marc:controlfield tag="001">(BDRC)bdr:W00EGS1016181</marc:controlfield>
They are suggesting we break it out into two fields. the 001 for just the unique ID for the work (e.g.:
W00EGS1016181
) and the 003 field for the organizational identifier, which they have asMaCbBDRC
They also asked us to get rid of our 035 field. In most cases, a system will be able to build the contents of this field from the 001 and 003, or 010.