Smithsonian / Match-Getty-AAT

R/Shiny app that matches subjects terms to the Getty AAT using keywords or a full text search
Apache License 2.0
3 stars 0 forks source link

Connect with other approaches? #1

Open ajs6f opened 5 years ago

ajs6f commented 5 years ago

Hey, @ljvillanueva, just dropping you a line that you are perhaps the fourth or fifth person in the last few weeks I've seen start new efforts to reconcile your data against external vocabularies. Corey DiPietro was the most recent. He found welcome and surprising the news that we have production instances of tools like OpenRefine and Karma already available and supported by OCIO.

I'm trying to get some coordination going here. It seems a bit silly for dozens of people to redo the same work, especially if they are all working at the same place! :) Would you be interesting in joining a discussion I start at the Taxonomy WG Vicki Portway has convened? It seems pretty germane, and if Vicki doesn't think so, I'd be happy to create a new space to discuss this stuff in.

jmboehm commented 4 years ago

Hi @ajs6f , @ljvillanueva , Sorry for the necroposting. My colleagues and I are facing the task of reconciling data from many different museums using many different vocabularies to the AAT-- I'd be curious if you found better approaches than just querying the AAT APIs (perhaps via recognizing DBpedia, or by training a model to recognize AAT entities directly?). Many thanks for any pointers. Best, Johannes

ajs6f commented 4 years ago

Hello @jmboehm,

I've moved on to other concerns, but please don't take that to mean that I don't consider these questions important. I hope @ljvillanueva has time to take up this conversation.

In any event, have you been in touch with Linked.Art? I think you would find a receptive audience there for any discussion about reconciliation against AAT! (It doesn't hurt that @azaroth42, one of the leaders of that community, is with the Getty and in close touch with the maintainers of AAT.) I'm happy to help introduce the topic, if desired.

Otherwise, I would be a little surprised if anyone at the Smithsonian is using ML techniques for this kind of problem. I myself heartily believe that that's the way forward for this (in fact I work in a group that supports research uses for ML), but new technology percolates very slowly through our curatorial institution. :smile:

azaroth42 commented 4 years ago

We have had a great deal of success using OpenRefine. Instructions: https://www.getty.edu/research/tools/vocabularies/obtain/openrefine.html

That could provide a nice training set for ML to then treat as a gold standard.

jmboehm commented 4 years ago

Thanks to both of you for the pointers, that's very helpful.