globalwordnet / cili

The Global WordNet Association Collaborative Inter-Lingual Index
Other
40 stars 8 forks source link

Add script to generate the CILI file #3

Closed goodmami closed 3 years ago

goodmami commented 5 years ago

edited This issue originally mentioned generating the mapping files (see text below). While this is maybe something desired, I actually intended for it to be about generating the main CILI file (ili.ttl, as it's currently named). Otherwise it is the same: make a script (somewhere), document here how to use it, then use it to produce new versions of the CILI.

original text

The "live" CILI mappings are in the OMW database. There should be a script as part of this project (or instructions in this project on how to use an external script) that generates the mapping files.

goodmami commented 5 years ago

If the script lives elsewhere, such as in OMW, then maybe this issue is more relevant for those, but there should probably still be some small amount of documentation about how and when to use the script.

goodmami commented 3 years ago

For my purposes, just the CILI inventory and the definitions is enough. The following script generates a simple tab-separated file with just this data (assuming that each ILI has a definition). The file size is less than half that of ili.ttl, although it's nearly the same when compressed. More interesting (from a downstream application's perspective) is that it's about 100x faster to parse as a tab-separated file than as a turtle file.

#!/usr/bin/env python3

from rdflib import Graph
from rdflib.namespace import SKOS

g = Graph()
g.parse('ili.ttl', format='ttl')

# pair each ILI (ignoring the URL part) with its definition
data = [(subj.rpartition('/')[2], obj)
        for subj, obj
        in g.subject_objects(predicate=SKOS.definition)]

# sort by ILI number
data.sort(key=lambda pair: int(pair[0].lstrip('i')))

print('ILI\tDefinition')
for ili, definition in data:
    print(f'{ili}\t{definition}')

Would it be possible to produce and upload such a file on releases (similar to what we've done with https://github.com/bond-lab/omw-data/)?

jmccrae commented 3 years ago

Sure. Do you want to implement this as a GitHub action?

goodmami commented 3 years ago

Done. See #6.