PerseusDL / treebank_data

Perseus Treebank Data
68 stars 45 forks source link

incorrect lemmatization of φυλάξομεν in Iliad 8.529 as φυλάζω, should be φυλάσσω #33

Open bcrowell opened 2 years ago

bcrowell commented 2 years ago

I think this is just an error. The two lemmas are unrelated in meaning, and φυλάζω, to divide, doesn't seem to exist in Homer. Cunliffe uses this as his first example of φυλάσσω.

lcerrato commented 2 years ago

@bcrowell I apologize — no one is monitoring this repository as far as I know. Tagging @gregorycrane for more info.

bcrowell commented 2 years ago

Right, I realize it's not actively maintained at this point. I'm just posting reports as I come across what seem to be errors, in the hope that the information will at some point be useful to others.

lcerrato commented 2 years ago

@bcrowell This is very much appreciated!

gregorycrane commented 2 years ago

I’m on it!

Sent from my iPhone

On Mar 17, 2022, at 1:12 PM, Lisa Cerrato @.***> wrote:



@bcrowellhttps://github.com/bcrowell This is very much appreciated!

— Reply to this email directly, view it on GitHubhttps://github.com/PerseusDL/treebank_data/issues/33#issuecomment-1071104290, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABHLVGO3E5IOQLXILVFJY3DVANRU7ANCNFSM5QT37DNQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you were mentioned.Message ID: @.***>

francescomambrini commented 2 years ago

I just want to thank @bcrowell for the excellent imput! I am also taking note of all your remarks. I am working to revise Homer and include it in my side project Daphne in the (hopefully near) future.

gregorycrane commented 1 year ago

We now have two versions being developed in parallel. Francesco, please share your UD version! Otherwise, it will be an even bigger pain to integrate. I have mainly worked on fixing lemmas but there are some corrections to the tagging. I also added some missing lines.


From: francescomambrini @.> Sent: Friday, April 1, 2022 3:03 PM To: PerseusDL/treebank_data @.> Cc: Crane, Gregory @.>; Mention @.> Subject: Re: [PerseusDL/treebank_data] incorrect lemmatization of φυλάξομεν in Iliad 8.529 as φυλάζω, should be φυλάσσω (Issue #33)

I just want to thank @bcrowellhttps://github.com/bcrowell for the excellent imput! I am also taking note of all your remarks. I am working to revise Homer and include it in my side project Daphnehttps://github.com/francescomambrini/daphne in the (hopefully near) future.

— Reply to this email directly, view it on GitHubhttps://github.com/PerseusDL/treebank_data/issues/33#issuecomment-1086234051, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABHLVGMQYKGPIUDO5EZRADTVC5CALANCNFSM5QT37DNQ. You are receiving this because you were mentioned.Message ID: @.***>

francescomambrini commented 1 year ago

Hi Greg, yes! They are now the files in the distribution of my project Daphne. The files are in data/annotation/latest.

bcrowell commented 1 year ago

Since posting this a year ago, I posted a few more issues here and then switched to maintaining my own separate database of patches in a project I call Lemming: https://bitbucket.org/ben-crowell/lemming .

As suggested by the name, I'm focusing mainly on correcting lemmatizations and fixing inconsistencies in them. I'm also fixing mistakes in the part of speech tags. I currently have about 500 patches, which when applied to the treebank data result in about 10,000 changes. Almost all of the patches originate from issues I found in Homer.

I would be happy to cooperate with anyone associated with Perseus who is interested in putting in some time on backporting these patches into Perseus. My patches are implemented not as patches to the XML files but as patches to a relational database of lemmatizations and part-of-speech tags. Cooperating on this might be a good short-term hourly project for an undergrad with some coding skills and an interest in digital humanities.

An issue that arises in this sort of thing is that one gets variants of the same lemma, e.g., ἀγαπάζω/ἀγαπάω or αἴθε/εἴθε. Depending on what one is trying to do, it could be better either to lump these or to split them. I have a small auxiliary database of these: https://bitbucket.org/ben-crowell/ransom . My policy with the patches is to split these variants, but note them in the auxiliary database.