en-wl / wordlist

SCOWL (and friends).
http://wordlist.aspell.net
373 stars 87 forks source link

What does pigweabbits mean? #350

Closed ambiamber closed 1 year ago

ambiamber commented 1 year ago

https://github.com/en-wl/wordlist/blob/master/other/mwords.tar.gz has the word pigweabbits in it. I have tried a number of searches but I have not found it anywhere but in the Moby Project (mwords) collection and in https://github.com/kkrypt0nn/Wordlists/blob/master/passwords/openwall.txt

In mwords.tar.gz it is in the 354984si.ngl file which is described as: Over 354,000 single words, excluding proper names, acronyms, or compound words and phrases. This list does not exclude archaic words or significant variant spellings.

If pigweabbits is a word, I can't find it in anywhere, not in old books or newspapers, etc.

By contrast, the word gazabe appears in large numbers of old newspapers and some old books and in comments in the source code to CTSS (Compatible Time Sharing System, the first general-purpose computer time sharing OS). The oldest written uses of gazabe imply it meant an important person over time it ended up meaning a fool. "Gazabe" is not in any of the word lists here. I was able to find gazabe though searching but not pigweabbits. Pigweabbits is either more obscure than gazabe or it was never used anywhere other than someone's password, etc.

Now getting back to my question, what does pigweabbits mean?

Thanks

Meekohi commented 1 year ago

image

(Ok but agreed, I cannot find any real reference to the word either)

Tex2002ans commented 1 year ago

If pigweabbits is a word, I can't find it in anywhere, not in old books or newspapers, etc.

Agreed. I did a search on Google, and there's only ~4000 hits, but they all look to be spam and other "list of all words" + "scrabble solvers".

Probably just snuck into SCOWL from some really low-quality wordlist that smashed random words together.


This word only exists in size 95... and in the SCOWL Readme:

The 95 contains just about every English word in existence and then some. Many of the words at the 95 level will probably not be considered valid English words by most people.


In SCOWL, words that exist in:

Side Note: Back in May 2021, I did privately bulk submit a giant list of words to kevina:

https://github.com/en-wl/wordlist/issues/318#issuecomment-850689005

This would:

That research will eventually be merged into SCOWL. :)

(Which reminds me, I should restart my dictionary scraper!)

Tex2002ans commented 1 year ago

Now that I think about it, the word could also be a "copyright trap" (most likely from some other list that got merged into SCOWL).

See Wikipedia.org: "Fictitious Entry":

Fictitious or fake entries are deliberately incorrect entries in reference works such as dictionaries, encyclopedias (including Wikipedia), maps, and directories. [...]

Fictitious entries are added by the editors as a copyright trap to reveal subsequent plagiarism or copyright infringement.

[...]

Copyright traps

By including a trivial piece of false information in a larger work, it is easier to demonstrate subsequent plagiarism if the fictitious entry is copied along with other material. [...] Similarly, trap streets may be included in a map, or invented phone numbers in a telephone directory.

kevina commented 1 year ago

Now that I think about it, the word could also be a "copyright trap" (most likely from some other list that got merged into SCOWL).

@Tex2002ans the word is only found in Moby Project (mwords) collection which was released in the public domain. I am very careful about the sources that are used in SCOWL.

Tex2002ans commented 1 year ago

the word is only found in Moby Project (mwords) collection which was released in the public domain.

Thanks for that info.

(How did you find out which word came from what list? Is there any public way we could look up the same info?)

I am very careful about the sources that are used in SCOWL.

Yes, I suspected as much. :)


PS. Since I sent you my wordlist in 2021, my project has mostly been laying dormant.

I did find a few more online dictionaries in that time though, so I'll have to kick that project back into gear + continue scraping all those size 95s for more detailed stats. :)

kevina commented 1 year ago

(How did you find out which word came from what list? Is there any public way we could look up the same info?)

You need to get the source code from git and compile it once to make sure everything is linked in properly. After that any source lists are linked in under the scowl/r/ directory, so just search there (but be sure you are following symbolic links). For example:

$ cd scowl/r
$ fgrep -R 'pigweabbits' .
./mwords/354984si.ngl:pigweabbits

PS. Since I sent you my wordlist in 2021, my project has mostly been laying dormant.

I did find a few more online dictionaries in that time though, so I'll have to kick that project back into gear + continue scraping all those size 95s for more detailed stats. :)

Thanks, and sorry for not really getting back to you. I have not really been very active with SCOWL as of late.

Tex2002ans commented 1 year ago

You need to get the source code from git and compile it [...]

Fantastic. Thanks.

Thanks, and sorry for not really getting back to you.

It's okay. But definitely let me know if there's anything I can do to sift through some of the data further for you. :)

I have not really been very active with SCOWL as of late.

I have quite a few other projects sitting dormant too.

... Like correct hyphenation of every single word in the English language.

(Years ago, I was able to catch wrongly hyphenated "Petrograd" + get a few more examples added to the hyphenation exception dictionaries.)

All this dictionary scraping helps indirectly chip away at that too. :)

And now that auto-hyphenation and alternate ways of interacting with texts are getting more prominent, this work is becoming more important too.


Like what's going to happen when everyone starts talking about their pet pigweabbits? :)

How do you pronounce it?

This could completely change the hyphenation!

The entire fate of the world hangs in the balance of this amazing word!!! :)

kevina commented 1 year ago

As the word is only found at level 95, I am leaving it in.