alpheios-project / morphlib

Morphological Lookup Library
GNU General Public License v3.0
3 stars 0 forks source link

short definitions #15

Open balmas opened 7 years ago

balmas commented 7 years ago

@elijahjcooke the persian notes at https://github.com/alpheios-project/persian#current-status-as-of-2-april-2016 describe the general process that Alpheios used to lookup short definitions, given a lemma

The greek short definitions used previously by Alpheios, indexed to morpheus, can be found in the sourceforge repo. There are different ones per source dictionary, for example the lsj ones are here:

https://sourceforge.net/p/alpheios/code/HEAD/tree/dictionaries/grc/lsj/trunk/src/grc-lsj-defs.dat And for lewis and short for latin

https://sourceforge.net/p/alpheios/code/HEAD/tree/dictionaries/lat/ls/trunk/src/lat-ls-ids.dat

example javascript that loaded the short defs:

https://github.com/alpheios-project/alpheios5/blob/master/scripts/lang-tool-greek.js#L62-L92

And then the main lookup was done using the datafile module

https://github.com/alpheios-project/alpheios5/blob/master/scripts/datafile.js

balmas commented 7 years ago

We need to develop a new plan for the Latin short definitions. This is going to require some investment of time.

The current alpheios tools use the whitaker's words version of the morphology service, which is deployed at http://alpheios.net/perl/latin?word=

This is an early version of the service that doesn't have the JSON wrapper support.

I tried deploying this binary for use with the Morphology Service, but the binary that was built for alpheios doesn't work on the newer versions of Linux. An initial build on the newer version failed, and it would take time that I don't have right now to try to rebuild it. The source code is here https://sourceforge.net/p/alpheios/code/HEAD/tree/wordsxml/trunk/

Although I could, with some effort, get the Morphology Service running on services.perseids.org to proxy to the service alpheios.net, this doesn't seem to me to be a very sustainable solution..

Other options are:

1) taking time to rebuild the short defs file using Morpheus Lemmas and the LS TEI XML. It may be possible to do this fairly easily following the process defined at https://sourceforge.net/p/alpheios/code/HEAD/tree/dictionaries/trunk/lexica%20readme.txt

2) Looking for alternative services to provide Latin short definions. Logeion and the online Whitaker's words are two possible options. There may be others.

3) We could also fall back to using the Perseus 4 word lookup service, but in my mind this isn't any better, in terms of sustainability, than is pointing at the Alpheios whitaker's words, or building a shortdefs lookup file. If we are going to keep using Morpheus,I think it's better to use our own shortdefs file rather than relying on the Perseus service.

@abrasax @PonteIneptique @elijahjcooke @hcayless I invite your thoughts!

abrasax commented 7 years ago

Do we need Whitaker for morphological analysis and lemmatization or just the short definitions? If the latter, could I just not construct a short definitions file the way I did for Persian from either LS or Whitaker? Is this essentially your option 1 ?

On Fri, Oct 7, 2016 at 10:13 AM, Bridget Almas notifications@github.com wrote:

We need to develop a new plan for the Latin short definitions. This is going to require some investment of time.

The current alpheios tools use the whitaker's words version of the morphology service, which is deployed at http://alpheios.net/perl/ latin?word=

This is an early version of the service that doesn't have the JSON wrapper support.

I tried deploying this binary for use with the Morphology Service, but the binary that was built for alpheios doesn't work on the newer versions of Linux. An initial build on the newer version failed, and it would take time that I don't have right now to try to rebuild it. The source code is here https://sourceforge.net/p/alpheios/code/HEAD/tree/wordsxml/trunk/

Although I could, with some effort, get the Morphology Service running on services.perseids.org to proxy to the service alpheios.net, this doesn't seem to me to be a very sustainable solution..

Other options are:

1) taking time to rebuild the short defs file using Morpheus Lemmas and the LS TEI XML. It may be possible to do this fairly easily following the process defined at https://sourceforge.net/p/alpheios/code/HEAD/tree/ dictionaries/trunk/lexica%20readme.txt

2) Looking for alternative services to provide Latin short definions. Logeion and the online Whitaker's words are two possible options. There may be others.

3) We could also fall back to using the Perseus 4 word lookup service, but in my mind this isn't any better, in terms of sustainability, than is pointing at the Alpheios whitaker's words, or building a shortdefs lookup file. If we are going to keep using Morpheus,I think it's better to use our own shortdefs file rather than relying on the Perseus service.

@abrasax https://github.com/abrasax @PonteIneptique https://github.com/PonteIneptique @elijahjcooke https://github.com/elijahjcooke @hcayless https://github.com/hcayless I invite your thoughts!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alpheios-project/morphlib/issues/15#issuecomment-252263100, or mute the thread https://github.com/notifications/unsubscribe-auth/AFX1E9VucTXCXm76OrIF1Ziu5BborleDks5qxlN1gaJpZM4KKGry .

balmas commented 7 years ago

Yes, that's my option 1

abrasax commented 7 years ago

I would prefer to substitute in Whitaker short definitions where they are available, because he did quite a bit of manual work using Glare and more recent lexicographical resources than the 1879 LS. Can you see any problem with doing that?

On Fri, Oct 7, 2016 at 11:42 AM, Bridget Almas notifications@github.com wrote:

Yes, that's my option 1

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alpheios-project/morphlib/issues/15#issuecomment-252286858, or mute the thread https://github.com/notifications/unsubscribe-auth/AFX1E6erVdaiZD8Rlcm4gByvsfoKWkbVks5qxmhvgaJpZM4KKGry .

balmas commented 7 years ago

not as long the lemmas match up with what Morpheus outputs.

balmas commented 7 years ago

@elijahjcooke FYI, I had to update the Morphology Service bundle for other reasons, so for now I've made it possible to query the alpheios.net hosted instance of the whitaker's words engine from it by using the whitakerLat engine.

E.g. http://services.perseids.org/bsp/morphologyservice/analysis/word?lang=lat&engine=whitakerLat&word=mare

This is not a good long term solution for all the reasons mentioned above, but it might help to get things moving with the Latin short defs for now.

You will find the short definitions in the mean element.

balmas commented 7 years ago

@PonteIneptique suggests for the future we look at Collatinus data for Latin

https://github.com/biblissima/collatinus/tree/master/bin/data

abrasax commented 7 years ago

splendid to have convenient access to du Cange and other medieval resources.

On Fri, Oct 21, 2016 at 6:56 AM, Bridget Almas notifications@github.com wrote:

@PonteIneptique https://github.com/PonteIneptique suggests for the future we look at Collatinus data for Latin

https://github.com/biblissima/collatinus/tree/master/bin/data

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alpheios-project/morphlib/issues/15#issuecomment-255383552, or mute the thread https://github.com/notifications/unsubscribe-auth/AFX1ExB2lqDyljDs_UP8xfe9za505Adtks5q2MRxgaJpZM4KKGry .