acdh-oeaw / dig_ed_cat

A web application to browse, analyze and curate a cataloge of digital editions
https://dig-ed-cat.acdh.oeaw.ac.at/
MIT License
9 stars 2 forks source link

get the data in rdf #124

Closed zxenia closed 6 years ago

zxenia commented 7 years ago

Hi guys, I tried to map the data to rdf schema and dublin core. I added a separate view for rdf/xml. It's not public yet since the mapping should be thought through more and not finished yet. You can see it when you logged in under get the data -> rdf/xml. Any comments and modelling suggestions are very welcome! I also want to create the same view for bibtex format since it can be easily imported in zotero.

gfranzini commented 7 years ago

This is awesome Ksenia, thank you! Here are some initial observations/questions from my side:

I also love the idea of a bibtex format for Zotero.

Very exciting developments, thank you!! :-D

zxenia commented 7 years ago

Hi Greta!

  1. Thanks for noticing this! I'll look into it.
  2. For now I just used dublin core elements set, but we can extend with dublin core terms. How I mapped it to dig-ed-cat data model, you can see in this xml template. Let say mapping to dublin core is a basic mapping. This data can be reused by libraries if they like to add the data to their catalogues because dublin core is very widespread standard. We can (and should) extend it and include other LOD vocabularies, like geonames for locations.
  3. Correct. This is not a static file, it's generated everytime when you call it, and script triggers all objects in database.
  4. Yes, we can try. thanks for links! I'll check them. Thank you!
zxenia commented 7 years ago

also the mapping is very basic now (just for testing the output), I was more curious to implement technical solution first :)

csae8092 commented 7 years ago

Awesome feature! Though I'd suggest to use some dedicated library to produce the RDF graph like rdflib. This will allow us to

  1. present the RDF data in various formats (e.g. XML/RDF, ttl, ...),
  2. but maybe even more important, such a library makes sure, that the data we produce is actually valid (which might not always be the case for 'hand made' RDF/XML).

For the mapping: Since we are currently anyway defining some kind of ACDH-Ontology for our soon to be ready ACDH-Repo, maybe we should try to map the catalog data against the ACDH-Ontology

gfranzini commented 7 years ago

I'm all for valid data, so rdflib gets my vote. Mapping the cat against the ACDH ontology also makes sense! In terms of LOD vocabularies, to start we could link to:

Thoughts?

zxenia commented 7 years ago

Hi ! I will try to use RDFlib for rdf/xml output. I mapped to bibtex, and tried to import bibtex output in Zotero. I was able to import all editions in Zotero library. The only thing is that in order to preserve more information for Zotero, items were imported as 'book' type in Zotero, but I added a tag 'digital edition'. If we'd map it to 'web page' type then we cannot save information like publisher (institutions) since there is no this field for web page type.

About the vocabularies: good suggestions! I had a thought today maybe we could collect urls for universities? then our rdf:about would point to a web presence of university. question about time ontology: Greta, you think we need it for historical periods?

gfranzini commented 7 years ago

Fantastic, thank you so much!! Good solution for Zotero, I don't see a better option. I was actually reading the Zotero documentation to see whether it is possible to add a new item type, but apparently not: http://forums.zotero.org/discussion/15636/1/changes-to-fields-and-item-types-for-zotero-31-/ (scroll down to 'Custom fields'/item types'). I can document this limitation once we make this feature publicly available. We should also add the Zotero logo to the website.

Regarding the time ontology and historical periods. The best option I was able to find until now is to use the time:GeneralDurationDescription class with the time:years property. See here. DBPedia has something more specific to historical periods but I'm not sure I fully understand it.

As for institution URLs. Of course! Where would I add these? To the institutions .csv file or somewhere else?

csae8092 commented 7 years ago

As for institution URLs. Of course! Where would I add these? To the institutions .csv file or somewhere else?

Yes, simply add a new column to institutions.csv ideally after type of location, maybe call it something like 'institution website'.

gfranzini commented 7 years ago

OK!

gfranzini commented 7 years ago

Hi, I added institution websites and synched the data. FYI, two institutions don't have a URL.

gfranzini commented 7 years ago

Also found this: https://www.w3.org/TR/vocab-dcat/

Let me know if there's something I can do here (for example, map the Catalogue data against a particular property or category in any of the vocabularies we're using).

zxenia commented 7 years ago

Hi, I looked into dcat vocabulary. Good idea I think since our edition was missing rdf:type. So then each digital edition will be dcat:CatalogRecord or dcat:Dataset? I am a bit confused since Dataset class actually has all the properties from dc terms which we are using.

Regarding mapping: I mapped institutions to foaf:Organization and places to gn:Feature and geo:SpatialThing. And just pulled the last version to the server so you can see how rdf looks like now.

The problem I have is about Periods. Seems rather complicated to me. Maybe you look into it.

Another thing is about creative commons license: we only have yes/no, but not the type of the license which would then make sense to use this in rdf and link to directly cc license online. So for now there is only dc:rights which store data from open_source field.

gfranzini commented 7 years ago

Hi Ksenia, this is awesome! I can look into Periods, no problem. I haven't been able to spend much time on the project this month as I've been busy writing up the paper on the Cat survey that I sent around in the spring. I'll resume work here as soon as that's out of the way! Could you perhaps assign this ticket to me so that I can more easily find it later?

Thanks!!

zxenia commented 7 years ago

great! thanks!

Could you perhaps assign this ticket to me so that I can more easily find it later?

I discovered that I actually can't do it, I only can assign people who are members of acdh-oeaw organization (strange).

gfranzini commented 7 years ago

OK, no problem!

zxenia commented 7 years ago

I added download in formats n3 and rdf/xml for selected editions. When we decide what to do with Periods, I'll update rdf schema. I also opened Bibtex format download for non-auth users.

gfranzini commented 7 years ago

Wonderful, I'll advertise it tomorrow! I hope to find some time this week to study the Periods issue.

csae8092 commented 7 years ago

@zxenia: Awesome!!!! though one minor thing: can we change the icons a bit, maybe only one download button which expands 'on click' into data-format specific icons?

zxenia commented 7 years ago

Hi, I' ll rework then this download button.

@gfranzini , for periods you can also look into this gazetteer http://perio.do/, maybe we can somehow link our periods to Periodo.

gfranzini commented 7 years ago

Interesting, thanks! Will look into it.

gfranzini commented 7 years ago

Hi @zxenia and @csae8092 ! So I studied the Perio.do Gazetteer and this is the closest mapping I could create between the Cat and Perio.do.

Catalogue Perio.do Collection
Antiquity [700BC-500AD] http://n2t.net/ark:/99152/p0qhb66qj4c [-600-499] ARIADNE
Middle Ages [500-1500] http://n2t.net/ark:/99152/p0pqptcfq9t [450-1500] ARIADNE
Early Modern [1500-1789] http://n2t.net/ark:/99152/p0qhb66kxnr [1500-1789] ARIADNE
Long Nineteenth Century [1789-1914] http://n2t.net/ark:/99152/p086kj9tqzk [1800-1900] Anderson Digital Index
Modern [1914-1965] http://n2t.net/ark:/99152/p0dg76fcnx9 [1878-1949] Pyla-Koutsopetria archaeological survey
Contemporary [1965-today] http://n2t.net/ark:/99152/p0qhb66d849 [1945-2000] ARIADNE

The ARIADNE mapping works almost perfectly, the mapping for 19th CENTURY and MODERN doesn't.

What do we do? What happens when we have multiple periods for one project? We would just add two Perio.do URLs?

zxenia commented 7 years ago

Hi, this looks good. Actually there is already a field 'PeriodO id' in database for storing this identifier. I have a question: in database some periods are a combination of periods (e.g. 'Early Modern; Long Nineteenth Century; Modern; Contemporary' - this is one period in database). so how do we deal with this?

csae8092 commented 7 years ago

We would just add two Perio.do URLs?

possible, but imho not a very nice solution because I'd say only a 1:1 relation between a concept like a periode and it's identifier makes much sense.

To deal with combination of periods we can either

I'd go for the second solution because I think something like

Early Modern; Long Nineteenth Century; Modern; Contemporary

constitutes its own period in our data. Plus the second solution is also easier to implement

gfranzini commented 7 years ago

@csae8092 by "register our periods in perio.do" you mean adding it to their database, right? I had a look at their website again and I can't seem to find information about adding a new period.

csae8092 commented 7 years ago

yes, I thought this would be possible....

csae8092 commented 6 years ago

@zxenia and @gfranzini should we close this issue? because get the data in rdf is basically implemented....

gfranzini commented 6 years ago

we can close it!