goodmami / wn

A modern, interlingual wordnet interface for Python
https://wn.readthedocs.io/
MIT License
199 stars 19 forks source link

Allow mirrors to be defined in the index #142

Closed goodmami closed 2 years ago

goodmami commented 2 years ago

Several wordnet projects have changes that will require some new features in the index.

These changes might look like this:

[pwn]
  warn = "This project has been renamed. See https://... for more information."
  [pwn.versions."3.0"]
    redirect = "wn30:1.4+omw"
  [pwn.versions."3.1"]
    redirect = "wn31:1.4+omw"

[wn30]
  label = "OMW English Wordnet 3.0"
  [wn30.versions."1.4+omw"]
    url = "https://..."

[wn31]
  label = "OMW English Wordnet 3.1"
  [wn31.versions."1.4+omw"]
    url = "https://..."

[ewn]
  label = "Estonian Wordnet"
  # ...
  [ewn.versions.2020]
    warn = "The Open English WordNet is now under the identifier 'oewn'. See https://... for more information."
    redirect = "oewn:2020"

[oewn]
  label = "Open English WordNet"
  language = "en"
  license = "https://creativecommons.org/licenses/by/4.0/"
  [oewn.versions.2020]
    urls = [
      "https://en-word.net/static/english-wordnet-2020.xml.gz",
      "https://...",
    ]
  [oewn.versions.2019]
    urls = [
      "https://en-word.net/static/english-wordnet-2019.xml.gz",
      "https://...",
    ]
goodmami commented 2 years ago

Trying to pin down the behavior, I think that having the redirect be automatic might actually be problematic, because if it happens silently, the user won't be aware that they should use a different identifier in order to use the lexicon. e.g:

>>> wn.download('pwn:3.0')  # really gets wn30:1.4
>>> pwn30 = wn.Wordnet('pwn:3.0')  # error

Some solutions:

goodmami commented 2 years ago

Another alternative, specify the error message in the index and don't bother with warnings or redirects. E.g.:

[pwn]
  error = "Instead of 'pwn:3.0' and 'pwn:3.1', please use 'wn30' and 'wn31'."

Then...


>>> wn.download('pwn:3.0')
Traceback (most recent call last):
  ...
wn.Error: Instead of 'pwn:3.0' and 'pwn:3.1', please use 'wn30' and 'wn31'.
fcbond commented 2 years ago

This looks reasonable.

On Fri, Oct 22, 2021 at 4:19 AM Michael Wayne Goodman < @.***> wrote:

Another alternative, specify the error message in the index and don't bother with warnings or redirects. E.g.:

[pwn] error = "Instead of 'pwn:3.0' and 'pwn:3.1', please use 'wn30' and 'wn31'."

Then...

wn.download('pwn:3.0') Traceback (most recent call last): ... wn.Error: Instead of 'pwn:3.0' and 'pwn:3.1', please use 'wn30' and 'wn31'.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/goodmami/wn/issues/142#issuecomment-948972456, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIPZRVJ2CGJQLW3EZMP22LUIBYTVANCNFSM5FK22LOA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Francis Bond http://www3.ntu.edu.sg/home/fcbond/ Division of Linguistics and Multilingual Studies Nanyang Technological University

goodmami commented 2 years ago

In that case #146 is for adding error and this issue is about adding mirrors.

I think I might just extend the current url key to accept a space-separated list of URLs:

[oewn]
  label = "Open English WordNet"
  language = "en"
  license = "https://creativecommons.org/licenses/by/4.0/"
  [oewn.versions.2020]
    url = """
      https://en-word.net/static/english-wordnet-2020.xml.gz
      https://...
    """

This way there's no change for those with only one URL.

fcbond commented 2 years ago

That seems like a very good idea.

On Sat, Oct 23, 2021 at 6:41 AM Michael Wayne Goodman < @.***> wrote:

In that case #146 https://github.com/goodmami/wn/issues/146 is for adding error and this issue is about adding mirrors.

I think I might just extend the current url key to accept a space-separated list of URLs:

[oewn] label = "Open English WordNet" language = "en" license = "https://creativecommons.org/licenses/by/4.0/" [oewn.versions.2020] url = """ https://en-word.net/static/english-wordnet-2020.xml.gz https://... """

This way there's no change for those with only one URL.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/goodmami/wn/issues/142#issuecomment-949994953, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIPZRWREBAOMWAO3F5PU5TUIHQ5TANCNFSM5FK22LOA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Francis Bond http://www3.ntu.edu.sg/home/fcbond/ Division of Linguistics and Multilingual Studies Nanyang Technological University