PerseusDL / treebank_data

Perseus Treebank Data
68 stars 45 forks source link

Incorrect lemmas #5

Open bmulligan3 opened 9 years ago

bmulligan3 commented 9 years ago

I noticed a few possible issues in the Cicero Cat. 1 data: regie lemmatized as adv (regius1) ut as conj (ut1)

3 numerals were lemmatized as NUMERAL1 (maybe this is standard?) v (quinque1) xii (duodecim1) vi (sex1)

I also noticed that the adverbial instances of vero were all lemmatized as verus1; should be vero1?

In the Petronius data, the following lemmas end with commas:

delibero, amicio, expello, subicio, aspergo, tergeo, poto, propino, desisto, tribunal, pervenio, uterque, repleo, conor, trepido, eripio, numero, lavo,

lcerrato commented 9 years ago

Thanks! I'm going to add in some links for future reference. Since there can be multiple treebanks for the same work, so attaching version info and identifiers for these will help tracking.

Cicero Cat. 1.1.-2.11 (v 1.5) https://github.com/PerseusDL/treebank_data/blob/master/v1/latin/data/1999.02.0010.xml

Petronius, Satyricon 6-78 (v.1.5) https://github.com/PerseusDL/treebank_data/blob/master/v1/latin/data/2007.01.0001.xml

gcelano commented 9 years ago

Hi,

Thank you for this. We are trying to clean the data for a new release, so your message is very helpful. Please, if you find anything else, report it.

Alatius commented 9 years ago

It's great to see the work progressing on the treebanks! In my own working copy of the repository I have made some further changes to lemmas, and also to some POS-tags; perhaps these changed can be reviewed and incorporated into the official treebanks, when applicable: https://github.com/Alatius/treebank_data/commits/master

bmulligan3 commented 8 years ago

Hello! I'm working with the Morpheus lemmas. Do you know an easy way to distinguish between homonymous lemmas? I.e. for aera and aera#2, how can I know what each refers to? Is there a master dictionary or morphology table for the lemmas?

balmas commented 8 years ago

Hi, if you are working with Latin, more information on the morpheus lemmas can be found in the latin lexical inventory data set. See http://sites.tufts.edu/perseusupdates/2014/03/21/announcing-the-perseus-lexical-inventory-an-open-linked-data-set/ for a little background.

The source files for this dataset are in GitHub at https://github.com/PerseusDL/cite_collections/tree/master/latlexent

You can also use the query form at http://perseids.org/tools/lexical/query.html if you want to query a form (you can also enter the lemmas themselves directly here, but it will query it as if it was a form)

Or you can query the SPARQL endpoint directly. An example query which looks up the entry for the lemma 'mare':

SELECT ?urn ?object
FROM <http://data.perseus.org/ds/lexical/latlexent>
WHERE
{ 
?urn <http://data.perseus.org/rdfvocab/lexical/hasMorpheusLemma> "mare"@lat .
?urn <http://purl.org/dc/terms/isReferencedBy> ?object 
}

This gives you links to places where you can look up linked dictionary entries for more on the lemma. e.g.

-----------------------------------------------------------------------------------------------------------------------------------------
| urn                                     | object                                                                                      |
=========================================================================================================================================
| <urn:cite:perseus:latlexent.lex34070.1> | <http://repos1.alpheios.net/exist/rest/db/xq/lexi-get.xq?lx=ls&lg=lat&n=n28019>             |
| <urn:cite:perseus:latlexent.lex34070.1> | <http://logeion.uchicago.edu/index.html#mare>                                               |
| <urn:cite:perseus:latlexent.lex34070.1> | <http://www.perseus.tufts.edu/hopper/text?doc=Perseus%3Atext%3A1999.04.0059%3Aentry%3Dmare> |
| <urn:cite:perseus:latlexent.lex34070.1> | <http://www.perseus.tufts.edu/hopper/text?doc=Perseus%3Atext%3A1999.04.0060%3Aentry%3Dmare> |
-----------------------------------------------------------------------------------------------------------------------------------------

Unfortunately, it looks like your specific example, 'aera2' is not actually present in this data set for some reason, just 'aera' is.

bmulligan3 commented 8 years ago

Thank you very much! This is exactly what I was looking for and you timely and thorough response was very much appreciated. Could I trouble you for directions to Greek raw file?

One of our projects this summer might be of interest: we're digitizing a Latin-Mandarin dictionary for Dickinson College On-Line and The Bridge projects. Thanks to your help we'll be able to link the Mandarin shortdefs to the Morpheus lemmas.

With thanks,

Bret

On Tue, May 31, 2016 at 5:04 PM, Bridget Almas notifications@github.com wrote:

Hi, if you are working with Latin, more information on the morpheus lemmas can be found in the latin lexical inventory data set. See http://sites.tufts.edu/perseusupdates/2014/03/21/announcing-the-perseus-lexical-inventory-an-open-linked-data-set/ for a little background.

The source files for this dataset are in GitHub at https://github.com/PerseusDL/cite_collections/tree/master/latlexent

You can also use the query form at http://perseids.org/tools/lexical/query.html if you want to query a form (you can also enter the lemmas themselves directly here, but it will query it as if it was a form)

Or you can query the SPARQL endpoint directly. An example query which looks up the entry for the lemma 'mare':

SELECT ?urn ?object FROM http://data.perseus.org/ds/lexical/latlexent WHERE { ?urn http://data.perseus.org/rdfvocab/lexical/hasMorpheusLemma "mare"@lat . ?urn http://purl.org/dc/terms/isReferencedBy ?object }

This gives you links to places where you can look up linked dictionary entries for more on the lemma. e.g.


| urn | object |

| urn:cite:perseus:latlexent.lex34070.1 | http://repos1.alpheios.net/exist/rest/db/xq/lexi-get.xq?lx=ls&lg=lat&n=n28019 | | urn:cite:perseus:latlexent.lex34070.1 | http://logeion.uchicago.edu/index.html#mare | | urn:cite:perseus:latlexent.lex34070.1 | http://www.perseus.tufts.edu/hopper/text?doc=Perseus%3Atext%3A1999.04.0059%3Aentry%3Dmare |

| urn:cite:perseus:latlexent.lex34070.1 | http://www.perseus.tufts.edu/hopper/text?doc=Perseus%3Atext%3A1999.04.0060%3Aentry%3Dmare |

Unfortunately, it looks like your specific example, 'aera2' is not actually present in this data set for some reason, just 'aera' is.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/PerseusDL/treebank_data/issues/5#issuecomment-222820113, or mute the thread https://github.com/notifications/unsubscribe/ANJg-XRlyROD8SCyA5G5BHY_zNsIJH5Vks5qHKJlgaJpZM4FrOP3 .

Bret Mulligan | Associate Professor Department of Classics | Haverford College Haverford, PA 19041 P: 610-896-1052 F: 610-896-1495 E: bmulliga@haverford.edu bmulliga@haverford.edu Web: http://www.haverford.edu/classics/faculty/ Twitter: @bretmulligan Nepos' Life of Hannibal [DCC http://dcc.dickinson.edu/nepos-hannibal/chapter-1] [print http://www.openbookpublishers.com/product/341]

balmas commented 8 years ago

I'm not sure exactly which Greek file you are referring to. The Greek unfortunately is lagging behind the Latin, and we don't have it available in the triplestore yet.

For most morpheus lemmas though, you can probably just query the LSJ on Perseus, using the dictionary lookup tool, e.g. http://www.perseus.tufts.edu/hopper/resolveform?type=exact&lookup=ai%29%3Da&lang=greek

Of course you need to convert to betacode for that. Alternatively, you might find the Alpheios mapping of Morpheus lemmas to LSJ short definitions useful. This is at https://sourceforge.net/p/alpheios/code/HEAD/tree/dictionaries/grc/lsj/trunk/src/grc-lsj-defs.dat

lcerrato commented 8 years ago

Hi, Perhaps the (deprecated) download of the hib_lemmas here http://www.perseus.tufts.edu/hopper/opensource/download

Also beta code, but many find the data useful despite the limitations.

--Lisa

bmulligan3 commented 8 years ago

Thank you again! Our work is also lagging behind on the Greek side. We have more than enough on the Latin side to keep us busy for bit!

Cheers,

Bret

On Thu, Jun 2, 2016 at 1:21 PM, Lisa Cerrato notifications@github.com wrote:

Hi, Perhaps the (deprecated) download of the hib_lemmas here http://www.perseus.tufts.edu/hopper/opensource/download

Also beta code, but many find the data useful despite the limitations.

--Lisa

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/PerseusDL/treebank_data/issues/5#issuecomment-223360836, or mute the thread https://github.com/notifications/unsubscribe/ANJg-fCOEQ5O_JmMCVnGIGpG4Dx3IKLJks5qHxEqgaJpZM4FrOP3 .

Bret Mulligan | Associate Professor Department of Classics | Haverford College Haverford, PA 19041 P: 610-896-1052 F: 610-896-1495 E: bmulliga@haverford.edu bmulliga@haverford.edu Web: http://www.haverford.edu/classics/faculty/ Twitter: @bretmulligan Nepos' Life of Hannibal [DCC http://dcc.dickinson.edu/nepos-hannibal/chapter-1] [print http://www.openbookpublishers.com/product/341]