globalwordnet / schemas

WordNet-LMF formats
https://globalwordnet.github.io/schemas/
19 stars 11 forks source link

Why is ILI a required attribute for synsets in the WN-LMF schemas? #75

Closed simongray closed 1 month ago

simongray commented 1 month ago

I just implemented WN-LMF as a new export format for DanNet to be queried using goodmami/wn.

Limiting myself to the relations defined directly by GWA is fine (it is a limited format, after all), however, I also ran into an issue where elements without a corresponding ILI key resulted in a failed import of the DanNet dataset in the goodmami/wn library.

I tracked the issue back to here where the various WN-LMF.dtd files state:

<!ATTLIST Synset
    id ID #REQUIRED
    ili CDATA #REQUIRED
    ... >

I must say that I find this requirement to be a quite limiting.

Why should a WordNet be fully linked to the ILI to be valid as WN-LMF...? AFAIK only the English WordNet fits this requirement and only because (and correct me if I'm wrong here) the CILI is essentially just the repurposed, core structure of the Princeton WordNet.

I suggest that this requirement be scrapped entirely.

jmccrae commented 1 month ago

The documentation says:

If you wish to define a new concept call the concept “in” (ILI New). If there is no mapping to the ILI leave this field empty (it is required).

The logic of making the attribute mandatory is to encourage people to attempt to link to the ILI.

fcbond commented 1 month ago

Hi,

you can just have ili='', that is, it is not linked to anything.

So the requirement is not so onerous :-).

We could have instead made the ILI completely optional, but I think we wanted to encourage people to link as much as possible.

Yours,

On Thu, 30 May 2024 at 14:47, Simon Gray @.***> wrote:

I just implemented WN-LMF as a new export format for DanNet to be queried using goodmami/wn https://github.com/goodmami/wn.

Limiting myself to the relations defined directly by GWA is fine (it is a limited format, after all), however, I also ran into an issue where elements without a corresponding ILI key resulted in a failed import of the DanNet dataset in the goodmami/wn library.

I tracked the issue back to here where the various WN-LMF.dtd https://github.com/globalwordnet/schemas/blob/master/WN-LMF-1.3.dtd files state:

<!ATTLIST Synset id ID #REQUIRED ili CDATA #REQUIRED ... >

I must say that I find this requirement to be a quite limiting.

Why should a WordNet be fully linked to the ILI to be valid as WN-LMF...? AFAIK only the English WordNet fits this requirement and only because (and correct me if I'm wrong here) the CILI is essentially just the repurposed, core structure of Princeton WordNet.

I suggest that this requirement be scrapped entirely.

— Reply to this email directly, view it on GitHub https://github.com/globalwordnet/schemas/issues/75, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIPZRXH3MZLJ2K4H7KP37DZE4NXVAVCNFSM6AAAAABIQ3RHVWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMZDKNRQGA2TSOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Francis Bond https://fcbond.github.io/

simongray commented 1 month ago

I see. Thanks for clearing that up.