Open GoogleCodeExporter opened 9 years ago
Original comment by kur...@googlemail.com
on 14 Aug 2012 at 10:50
The dataset looks promising to me, but I'm no linguist. The problem I see is
linking the different languages to iso-codes or other resources. In the format
available on the website, we only have the language names to use. Wiktionary
links would also be nice, but the problem stays the same.
Is there any more data in the xml-file?
What's a "concepticon"?
Original comment by der.brue...@googlemail.com
on 14 Aug 2012 at 12:04
There's a project called "LEGO"
http://lego.linguistlist.org/
that converted the IDS wordlists into a "LIFT" XML representation
http://code.google.com/p/lift-standard/
and added the language code and other metadata.
The words in the wordlists are linked to a central "concepticon", see:
http://www.aclweb.org/anthology/W/W10/W10-2101.pdf
that is in RDF.
Original comment by bamboofo...@gmail.com
on 14 Aug 2012 at 12:13
Sounds good. Could you upload the XML data or send it to me?
Original comment by der.brue...@googlemail.com
on 14 Aug 2012 at 12:36
Upload where? Just send me an email and we can work out how to get you the data.
Original comment by bamboofo...@gmail.com
on 14 Aug 2012 at 12:45
I have worked on the lingtyp ontology we talked about during the workshop in
March. As discussed with Steve and Martin B., the idea is to see typological
features as properties. I thus adeed WALS, IDS, Numerals and ASJP to an
ontology. This is not close to anything finished, but you might find it
interesting
http://galoes.org/ontologies/lingtyp-full.owl
The bare thing without IDS etc can be found at
http://galoes.org/ontologies/lingtyp.owl
The idea would obviously be to import lingtyp.owl into ids.owl etc.
I suppose there is some duplication with the existing concepticon.
Original comment by sebastia...@googlemail.com
on 14 Aug 2012 at 3:04
Lego's licence is cc-nc-nd, so an RDF conversion (being a derivative) is out of
the question without specific permission allowing it.
Original comment by joregan
on 14 Aug 2012 at 5:09
But we're not working with LEGO wordlists since they aren't published. We're
working with the IDS wordlists from MPI-EVA.
Original comment by bamboofo...@gmail.com
on 14 Aug 2012 at 5:14
cc-nc-nd does not preclude conversion into other formats, if I remember
correctly. From a post on[open-linguistics]:
https://creativecommons.org/licenses/by-nd/3.0/legalcode does include
"The above rights may be exercised in all media and formats whether
now known or hereafter devised. The above rights include the right to
make such modifications as are technically necessary to exercise the
rights in other media and formats, but otherwise you have no rights to
make Adaptations."
Original comment by sebastia...@googlemail.com
on 15 Aug 2012 at 8:19
I will have a meeting with Bernard Comrie, director of MPI-EVA and responsible
for IDS, later this month regarding license issues. Since IDS is currently
available as HTML on the servers of MPI-EVA, there should be no problem with
serving RDF as well. As far as reuse of the data is concerned, I am currently
not in a position to foresee the outcome of this meeting.
In order to prepare the meeting, could you give the following information:
- should the dump be hosted by MPI-EVA or elsewhere?
- what kind of applications using IDS data do you foresee?
- what kind of license would you recommend, and why?
- how would updates be managed?
I have certain ideas about some of those questions, but if the answers come
from an outside body, this would be better for purposes of negotiation
Original comment by sebastia...@googlemail.com
on 15 Aug 2012 at 8:24
The IDS data is freely downloadable, but you're right, there's not
explicit license on the site. However, LEGO used it, enriched it with
metadata, and put it in XML. Arguably it's easier to extract it from
that XML LIFT format than it is to download it all and parse it from
the site. The enrichment links the words in IDS to a centralized
"concepticon", as I mentioned above, that we do have permission from
Jeff Good to use in LLOD.
Additionally, if/when LEGO releases the other 2700 wordlists, since
they are also in XML and linked to the concepticon, then any work we
do extracting the IDS from this LIFT standard could then "easily" be
used to convert the LEGO wordlists to RDF.
One thing that might be an issue is that I heard there's possibly even
more up-to-date IDS data than what is on the website. I pinged
Hans-Joerg but haven't received a response.
If we are allowed to convert the IDS data to RDF, I think we should
offer to give it back to their project so they can also let users
download the RDF.
Original comment by bamboofo...@gmail.com
on 15 Aug 2012 at 8:41
> Subj:IDS license
>
> Sebastian:
>
> On the basis of the responses I got on this (which were not all mutually
> consistent), I have decided that we should go with CC-BY-SA, which was
> one of the options envisaged by you.
>
> Bernard
Original comment by sebastia...@googlemail.com
on 28 Aug 2012 at 9:08
HJ Bibiko has given me a dump of the IDS db, which I forwarded to Martin
Brümmer.
Original comment by sebastia...@googlemail.com
on 29 Aug 2012 at 11:38
First conversion is done, CKAN entry can be found here:
http://thedatahub.org/dataset/ids_dictionary
Diagram of the model can be found here:
https://dl.dropbox.com/u/65483422/ids-model-diagram.png
Opening new issue for validation and further interlinking.
Original comment by der.brue...@googlemail.com
on 3 Sep 2012 at 12:43
Correction: correct CKAN entry is here: http://thedatahub.org/dataset/ids.
Issue for interlinking and refinement:
http://code.google.com/p/mlode/issues/detail?id=94&colspec=ID%20Type%20Status%20
Priority%20Owner%20Dataset%20Summary%20Modified%20Reporter
Original comment by der.brue...@googlemail.com
on 3 Sep 2012 at 1:03
can ids:XYtranslation be complemented by dcterms;relation xy.wiktionary or
xy.wordnet? The vocabulary is basic, so most links should work out of the box.
Instead of dcterms:relation one could probably also use some lemon predidate
(deferring to JMcC)
Original comment by sebastia...@googlemail.com
on 3 Sep 2012 at 1:31
Some of the translations contain 2 words, words in brackets etc. The basic
conversion was done with d2rq, so further links will be added with a script to
validate the links before adding them to the dataset. Please continue the
refinement and interlinking discussion here:
http://code.google.com/p/mlode/issues/detail?id=94&colspec=ID%20Type%20Status%20
Priority%20Owner%20Dataset%20Summary%20Modified%20Reporter
Original comment by der.brue...@googlemail.com
on 3 Sep 2012 at 1:35
Original issue reported on code.google.com by
bamboofo...@gmail.com
on 6 Aug 2012 at 1:17