Open jorgk3 opened 8 years ago
How was this list generated? Some of the words possessive forms are valid such as "zookeeper's".
Sure, most likely most of the words in the list are valid. But there are also problematic ones, like "him's", etc.
How was the list generated:
List 1 = expanded current Mozilla dictionary (which is essentially SCOWL 2015.08.24 + 6000 proper names + 37 Mozilla-tems + 337 extra words (issue #137) + some extra possessive forms (issue #136)) + 353 erroneous words I'm trying to clean up.
List 2 = list of 353 erroneous words (343 erroneous possessive forms ("remind's" etc.) + 10 other words)
List 3 = List 1 - List 2 = corrected Mozilla dictionary.
From this I subtracted (comm -23) the expanded wordlist from the en-US dictionary add-on of 2006/2007 that was published until today at AMO (before the add-on owner replaced it with the pure 2015.08.24 dictionary). I enclose the Readme of this old dictionary.
Why did I do this:
I spot-checked this 2006/2007 dictionary and found "mind's". So this dictionary does not have the "remind's" problem that the current Mozilla dictionary shows (and I am currently correcting).
So I thought: If I subtract the 2006/2007 dictionary from my corrected Mozilla dictionary, I will find all the invalid possessive forms I missed to fix, assuming that they crept in after 2006/2007.
Lo and behold, the result was terrible. Subtracting the 2006/2007 dictionary, I was left with about 12000 words. All the lowercase words in the subtraction result come from SCOWL (unless they were Mozilla-added, remember, I subtracted from List 3 which has extra Mozilla words). One of them "zookeeper's" which is cool, as are most likely most of the 12000 words.
Since I was interested in possessive forms, I extracted the possessive forms, 6661 of them. Then I went looking for illegal words of the form "verb's" or "adjective's". And I found a fair few, as listed. I've just looked again and spotted: insured's - maybe: The insured's policy was not sufficient intended's - no good http://app.aspell.net/lookup?dict=en_US&words=insured%27s%0D%0Aintended%27s
In reality it really doesn't matter how the list was generated. All I can say is that I have a list of words, sadly a very long one, and on this list are a few/some/many (I don't know) invalid words which are in the SCOWL data set.
I have no idea how you can clean this up. Surely looking at them is not much fun. Perhaps you can run some statistical analysis and check the ones with low ranking manually. "him's" should be in this set.
You can redo the experiment. I can give you the 2006/2007 dictionary and you subtract it from any SCOWL dataset you like and see what you're left with.
The basic problem is that I simply don't have a good source for when a word reasonable has a possessive form. For example "above" can be a noun and thus without any additional information it is assumed it could have a possessive form. Frequency information could help but most corpus will split any word with a "'" into two, thus I must must use at least 2-grams and recombine them.
I will keep this list open, but am unlikely to act on it right now.
Above can be a noun? http://www.merriam-webster.com/dictionary/above says it's an adverb. Well, perhaps in: All the above is true.
What about her's him's give's get's ?
Yes "above" can be a noun so can "give" or "get". There is an entry it at www.merriam-webster.com you just need to scroll down. There is even a usage note for above. Now the noun form may not be very common, but this is just not information I have available to me right now.
For "him" and "her", well my source, 2of12id in alt12dicts, has it as a noun in addition to it being a pronoun. Is this accurate, I'm not sure. Alan (@biljir) did you find any evidence of "him" and "her" being used as a noun or is this an oversight. You have the following entries for him and her:
him N: hims
him P:
her A:
her N: hers
her P:
This also relates to #71.
More to the point. Cleaning up this list will require a lot of manual work. There is no automatic way to handle this apart from frequency analysis, which in and of itself is a lot of work.
Yes, him and her (and also he and she) can (uncommonly) be used as a noun, generally with the meaning "male animal" or "female animal". A canonical sentence is "That dog isn't a him; it's a her!"
@biljir, can you ever see those words used in a possessive form? What about "above, "give" or "get"? Also, can you see "hims" or "hers" having a usage?
hims/hers are just possible. "Those puppies are all hims". Him's and her's seems really unlikely, though not absolutely impossible. her's in particular is more likely to be an incorrect spelling/misunderstanding of hers than a real possessive.
above's seems within the realm of possibility. "I'm one of None of the Above's supporters." give, probably not. And get as a noun is somewhat archaic - I don't have enough familiarity with the way that would be legitimately used to say for sure. Certainly, get as a noun is not something you'd expect to see often in a modern context, though it might show up in a historical or fantasy novel.
IMHO, you're arguing hard to maintain some very uncommon possessive forms, speaking of "not absolutely impossible" and "realm of possibility". This contradicts Kevin's stated policy that "size 60" shouldn't contain less common words that could mask spelling errors, favourite example: "calendar"/"calender". "Her's" obviously masks a spelling error where "hers" wasn't written correctly.
Following the argument of being possible, the possessive forms I requested in issue #136 should indeed join their root words at the same level. IMHO all these forms except "indoor's" have more validity than "hers's" or "him's".
In Mozilla bug 1235506 (https://bugzilla.mozilla.org/show_bug.cgi?id=1235506) I've been working on removing 342 incorrect possessive forms from the Mozilla maintained dictionary, like for example "remind's". We never understood where these form originated.
Now I have compared the 2007 version of the Mozilla dictionary with my proposed corrected version and still found some incorrect possessive forms. These clearly come from the underlying SCOWL data.
I'm attaching a list of 6661 possessive forms which needs filtering for errors. I just want to point out some highlights: above's her's him's give's get's (The en-GB dictionary I'm using marks these five as errors.)
http://app.aspell.net/lookup?dict=en_US&words=above%27s%0D%0Aher%27s%0D%0Ahim%27s%0D%0Agive%27s%0D%0Aget%27s questionable-apostrophe-s.txt