PerseusDL / catalog_data

MODS and MADS data for the Perseus Catalog
13 stars 12 forks source link

unexpected punctuation in source notes #101

Closed lcerrato closed 6 years ago

lcerrato commented 7 years ago

in the notes here: http://catalog.perseus.org/catalog/urn:cite:perseus:author.745

some of the source notes end in stray commas and/or colons

TLG Canon of Greek Authors and Works, Third Edition,
Smith's Dictionary of Greek and Roman Biography and Mythology, Vol , 1867, p. 1:
Brill's New Pauly:

these appear in the data itself, but not in every source — just some.

 <mads:note type="source">TLG Canon of Greek Authors and Works, Third Edition,</mads:note>
  <mads:note type="source">Smith's Dictionary of Greek and Roman Biography and Mythology, Vol , 1867, p. 1: </mads:note>
  <mads:note type="source">Brill's New Pauly:</mads:note>

I'm wondering if these should be in the data as the interface does insert a line break between sources and this doesn't appear throughout. It reads as though there is missing info.

cwulfman commented 6 years ago

I believe the trailing semicolons and commas have been removed from the elements. The formatting in those elements in general seems rather wonky, but fixing that is for another day. If the files look good to you, please go ahead and close this ticket.

AlisonBabeu commented 6 years ago

Actually the larger problem here isn't just the unexpected punctuation (but its great to have that addressed), but the fact that wherever you had TLG, Smith's or Brill's listed liked in Lisa's example, it was a placeholder that was part of a MADS record template. So Lisa is right it is missing information. I had been hoping to actually delete all the instances where this empty template text occurred, not just the punctuation (does that make any sense?)

At one point I had hoped to go into the MADS records and further fill out this information , for instance, when the authors had actual information in TLG or Brill's but that is highly unlikely at this point.

So would it also be possible not to just remove the punctuation, but also the text itself?

cwulfman commented 6 years ago

With XQuery and XSLT, most things are possible.... ;)

Can you be more specific about that MADS record template? Do you want to remove all the elements? Or do you want to get rid of elements that contain particular bits of text (e.g. the string "Brill" or "TLG")?

AlisonBabeu commented 6 years ago

Many of the mads:note elements are fine (please don't remove them! :) )

The template text is exactly as Lisa has included it:

 <mads:note type="source">TLG Canon of Greek Authors and Works, Third Edition,</mads:note>
  <mads:note type="source">Smith's Dictionary of Greek and Roman Biography and Mythology, Vol , 1867, p. 1: </mads:note>
  <mads:note type="source">Brill's New Pauly:</mads:note>

Literally I would like to remove the mads:notes elements that include this information and no other. There were blank templates included in all MADS records that I should have deleted long ago.....

cwulfman commented 6 years ago

Consider it done!