fordmadox / Excel-to-DSC

7 stars 3 forks source link

Geographic subject headings not coding correctly #7

Open baleywells opened 1 month ago

baleywells commented 1 month ago

Hello! I'm a Graduate Research Assistant for University of Oklahoma Libraries. I use Excel and Oxygen XML to create EAD files that I upload into ArchivesSpace. I'm having issues with Oxygen XML validating subject headings. We use blue for subjects, orange for geographic subjects, and purple for people. Blue and purple work like a charm, but I'm having issues with orange.

Subjects (blue) come in as and then I replace them with . Geographic subjects (orange) come in as and then I replace them with . Purple always comes in as so I don't have to change them. They should all look like this when I am done:

Smith, John Blueberries Oklahoma

However, orange is not registering no matter what I do. Oxygen XML will NOT recognize them. It gives me a ton of validation errors and when I read the error message, it says, "Element 'controlaccess' cannot have character [children], because the type's content type is element-only."

Does anyone have any idea how to fix this? My understanding of coding is very limited and I've tried everything that I can think of so far. Any help is greatly appreciated! Screenshot (2)

fordmadox commented 1 month ago

Hi, @baleywells.

You have stumbled across a feature that was never fully implemented, primarily because it's pretty tricky to handle round-tripping any subject and agent links in ArchivesSpace. I have an idea for how ArchivesSpace could handle this if / when it adopts the next iteration of EAD, EAD4, but that's likely still years away at this point 😃

Anyhow, I think I've got a fix for your issue, but could you send me a copy of a file that you've had trouble with so that I can confirm? Also, have you had any issues in general with managing the headings after imported into ASpace? This Excel to EAD process was created before ArchivesSpace had its own Spreadsheet Import job. I expect that the latter might provide better results when dealing with subjects and agents (more options, columns, etc.), so I would give it a look if you haven't yet.

Anyhow, when using this ASpace-agnostic spreadsheet process, using the font colors to indicate the controlaccess heading type is quite finicky (I'd like to do this part over completely!), but here's an overview of the Hexadecimal values that the process currently requires:

I will probably want to change how this works, but right now you can create mixed content with the following font colors (which results in a pretty hideous rainbow):

    (when there's a second color, that's to deal with the issue of 
    converting a file from XML to XLSX, at which point MS Excel changes the color)

    #FF0000 = title
    #0070C0, #0066CC = corpname
    #7030A0, #666699 = persname 
    #ED7D31, #FF6600 = famname
    #44546A, #339966 = geogname
    #00B050, #008080 = genreform     
    #00B0F0, #00CCFF = subject
    #FFC000, #FFCC00 = occupation
    #FF00FF = function
    #000000 = name, but only in the controlaccess column.

And though it is not super easy to do in Excel, since it takes about three or more steps to add a specific color code, it can be done. See the attached screenshot, which illustrates where I copied and pasted one of those codes above in a test Excel file.
Screenshot 2024-09-22 at 3 56 50 PM