en-wl / wordlist

SCOWL (and friends).
http://wordlist.aspell.net
Other
388 stars 78 forks source link

SCOWL project maintenance #394

Open Jamim opened 8 months ago

Jamim commented 8 months ago

👋🏻 Hello @kevina,

First of all, thank you for all your work on this database of English words! 🙇🏼

🏚️

Sadly, this repository now looks completely abandoned. There are over a hundred of issues awaiting resolution and 3 PRs awaiting feedback for years.

🏡

I understand you probably have no spare time to spend on SCOWL, but would you mind considering delegating maintenance to some community members that you can trust to keep the project going?

Best regards!

kevina commented 8 months ago

Updating the wordlist has been a low priority but the repo is not completely abandoned. I tend to add words in large batches and it has been a long time since I did an update.

I am open to delegating, the main thing is finding people who share the same views on what type of words should be included and has the necessary technical skills to add them.

As far as the apostrophe handling see #122, I am not sure what the correct path forward is.

Meekohi commented 8 months ago

Perhaps splitting the "technical maintenance" and the "which words are allowed decisions" might be a way to have the best of both worlds. e.g. @kevina / @biljir "approve" words to be added and have a small group of contributors who can make sure it is done properly from a technical perspective.

tbh I don't think it's a big issue to continue as-is (how often does the wordlist really need to change), but maybe it would be nice to see faster resolution times when people propose words to be added just so there isn't the impression things are being ignored.

Clearing out some of the years old issues would also improve the perception that things are being maintained etc.

marcoagpinto commented 8 months ago

@kevina

The apostrophe issue is easy to fix.

Simply replace: WORDCHARS 0123456789 with WORDCHARS 0123456789’

kevina commented 8 months ago

@marcoagpinto, this is not the place to discuss this, please see #122.

kevina commented 6 months ago

Currently SCOWL is not in a state that I am comfortable passing on to anyone. SCOWL was originally about combining high quality word lists and the mechanism for making corrections is very hackish. A while ago I posted a database version of SCOWL (#306). While this is an improvement it doesn't address the core issue of maintainability and can likely make things worse. The SQL used to convert from the source lists to the database form is beyond complex and not something I want to pass on to anyone, let alone release to the public.

Instead my current plan is to create a text file containing most of the information in the database. This file will combine all aspects of SCOWL (the lists themself), VarCon (the variant conversions) and AGID (the POS and inflection information) into one place. For example the entries for color might be:

35: A Cv DV: color <n>: colors  color's 
35: B C D: colour <n>: colours  colour's 

35: A Cv DV: color <v>: colored  coloring  colors 
35: B C D: colour <v>: coloured  colouring  colours

This new file will then become the source for SCOWL, in that all words lists will be created from this master file. To add new words or to make corrections, all one has to do is edit this file. Naturally this will lose the ability to create SCOWL from the source lists, but I no longer think that is worth it.

I will also write some python code to aid in maintaining this file and to catch errors.

Once this is done I will likely start adding new words again as that will serve as a good test of the new format and scripts to maintain it.

Once I am happy with the new format, I will be willing to hand off the maintenance of SCOWL to other people I trust.

marcoagpinto commented 5 months ago

Heya, Kevin,

It is good to know that you intend to hand off the maintenance of the dictionaries.

I hope new words can be added more frequently.

I, myself, have added 140 000+ words to English British in 11 years.

It is a life-time task.

kevina commented 4 months ago

A preview version of the new format is available in the v2 branch of this repo.

The documentation is still incomplete but the format should mostly be mostly stable by now.

Early feedback is welcome, but please use #398 to leave feedback or create a new issue.