en-wl / wordlist

SCOWL (and friends).
http://wordlist.aspell.net
Other
387 stars 78 forks source link

English Dictionary Additions (Canadian) #248

Closed ghost closed 4 years ago

ghost commented 5 years ago

There are many science-oriented words, and regular Canadian spellings, missing from the English dictionary. Please consider adding whatever words below you feel are suitable:

Amniote
amniote
Amniotes
amniotes
analyse
analysing
anoxygenic
Archæan
Archicebus
Asbian
autotrophs
Bentonite
bentonite
biofilm
biofilms
biomarkers
biostratigraphic
biostratigraphy
bipedalism
birthdate
brobdingnagian
Brobdingnagian
Casineria
Chicxulub
chronostratigraphic
clade
compressive
cosmochronology
crocodilian
Cyanobacteria
cyanobacteria
cyanobacterial
cyclostratigraphy
demosponges
ejecta
emphasising
endoplasmic
enwraps
Eoraptor
Ercaicun
ercaicunensis
eukaryogenesis
eukaryote
eukaryotic
evolutionarily
fluxuations
Fossilised
fossilised
Francevillian
Friedmann
gigatonnes
Greenstone
Hadeon
Hadrocodium
Haikou
Haikouichthys
heterodont
hydrothermal
hypothesised
Interlayered
isopropylcholesterol
jawless
Jetsetting
Katsuhiro
lagoonal
leaped
limbed
lizardesque
Lokeslottet
Lokiarchaeum
magnetostratigraphy
Megatsunamis
metallicity
microbially
miniscule
Multicellular
multicellular
Multicellularity
multicellularity
neocortex
Neoproterozoic
Nuvvuagittuq
Nyasasaurus
obligately
oxygenic
paleo
paleoenvironment
Paleontological
Paleotemperature
photoevaporation
photosynthesising
Phytoplankton
phytoplankton
Pilbara
planetesimals
procyanobacteria
prokaryote
Prokaryotes
prokaryotes
proto
protoplanetary
protostar
protosun
pulverised
radiometrically
reticulum
scuttering
Shales
shales
spacetime
specialise
steranes
stromatolite
Stromatolites
stromatolites
Sturtian
subdivded
tarsiers
Tellus
tetrapod
Tetrapods
tetrapods
Theia
Theia's
theorise
Tiktaalik
timespan
trackways
tuatara
Urmetazoan
vapourised
volcanism
kevina commented 5 years ago

I am not Canadian; however, according to my sources the preferred Canadian spelling is with a ize not ise in these words:

emphasising fossilised hypothesised pulverised specialise theorise photosynthesising fossilised

Are there some Canadians that prefer the ise spelling?

I have the following marked as level 1 (common) variants:

leaped analyse analysing

They are excluded by default but included in the large dictionary or a custom one created by http://app.aspell.net/create.

I also have miniscule marked as a level 2 variant. The preferred spelling seams to be minuscule in both American and British English with some people considering miniscule as an incorrect spelling of minuscule, I don't have any Canadian specific info on this word.

I will get back to you on the other words in your list.

ghost commented 5 years ago

Are there some Canadians that prefer the ise spelling?

Apparently just me! Toss them out. From Wikipedia:

Hence, some have used the spelling -ise in English, as in French, for all these words, and some prefer -ise in words formed in French or English from Latin elements, retaining -ize for those formed from Greek elements.

miniscule

This one is still widely regarded as an error, although it is appearing in publications. Toss it.

kevina commented 5 years ago

@biljir do you care to comment on any of these words?

biljir commented 5 years ago

I have always been opposed to "birthdate". For all the user interfaces that insist on writing it as a single word, Google Books still shows the two-word form as dominating, and increasing its dominance as time goes on. Similarly, "timespan" is dominated by "time span", though in this case the dominance seem to be decreasing. Likewise, "spacetime" is dominated by "space-time".

It occurs to me it would be useful to have a file of words such as the above, to make it easier to reject new requests for their inclusion. The list could include words rejected for other reasons as well, though you'd want to include only ones common enough that repeated submissions is a problem.

If "leaped" is really missing, I consider that a mistake. (The GitHub (or perhaps the Firefox) spell checker did not flag it.)

A few scientific words that seem to me worth considering are "eukaryote(s)", "multicellular" (dunno why it's present in both upper- and lower-case in the lists though), "prokaryote" and "shales". I have run into the term "neocortex", but have no idea how common it is.

If "planetesimal" is already accepted, then "planetesimals" should be as well. (The singular was flagged by GitHub.) I think the term is relevant to the history of astronomy, but is not part of current usage.

"subdved" appears to be a spelling error for "subdivided" (which GitHub is fine with).

I agree on rejecting "miniscule".

ghost commented 5 years ago

It occurs to me it would be useful to have a file of words such as the above, to make it easier to reject new requests for their inclusion. The list could include words rejected for other reasons as well, though you'd want to include only ones common enough that repeated submissions is a problem.

Or an online form with a subject line, text area (for comma or line-separated words), and submit button that, upon submission, automatically opens a new GitHub issue.

kevina commented 5 years ago

If "leaped" is really missing, I consider that a mistake. (The GitHub (or perhaps the Firefox) spell checker did not flag it.)

@biljir keep in mind that this we are talking about the Canadian not the American version. It's exclusion is based on this entry in VarCon:

# leaped (level 35)
A Bv: leaped / Av B: leapt

However https://www.lexico.com/en/definition/leap has leaped as the more common variant. I will investigate this more closely.

It occurs to me it would be useful to have a file of words such as the above, to make it easier to reject new requests for their inclusion. The list could include words rejected for other reasons as well, though you'd want to include only ones common enough that repeated submissions is a problem.

I don't have a tool to automatically reject words, but I do have two tools I use to guide if new words should be included. See http://app.aspell.net/lookup and http://app.aspell.net/lookup-freq

biljir commented 5 years ago

Interestingly enough, I'm an American and use "leapt". Since I lived in a number of places in my early years, I have no idea where I picked it up, but presumably it was the standard form in one of the places I lived, or possibly I got it from one of my parents (who came from two very different parts of the country).

ghost commented 5 years ago

I'm an American and use "leapt".

Almost seven years ago to the day ... https://english.stackexchange.com/a/76168/22099

kevina commented 5 years ago

Here is a report on the frequency of some of the words. @biljir any comment on these?

Word                 |  Adj. Freq   Newness Rank | Normal dict | Large dict
  similar words      |  (per million)            | should incl | should incl
---------------------|---------------------------|-------------|-------------
proto                |       3.7330  0.9   16368 |    ****     |    **** 
  Porto              |      0.6x     1.4   22563 |    incl.    |    incl.

compressive          |       3.5919  0.8   16761 |    ****     |    incl.

reticulum            |       2.8076  0.5   19479 |    ****     |    incl.

phytoplankton        |       2.8055  0.7   19490 |    ****     |    incl.

eukaryotic           |       2.2110  0.9   22431 |    ****     |    incl.
  eukaryotes         |      0.5x     1.0   33331 |    ****     |    incl.

endoplasmic          |       1.7976  0.5   25331 |    ****     |    incl.

hydrothermal         |       1.6928  0.7   26289 |    ****     |    incl.

shales               |       1.6314  0.4   26864 |    ****     |    **** 
  sales              |     59.3x     0.8    1229 |    incl.    |    incl.

biomarkers           |       1.4007  3.1   29289 |    ****     |    **** 

biofilm              |       1.0691  1.6   34217 |    ****     |    **** 
  biofilms           |      0.5x     1.9   48375 |    ***      |    **** 

spacetime            |       1.0653  1.2   34290 |    ****     |    **** 

volcanism            |       0.9394  0.5   36765 |    ****     |    incl.

cyanobacteria        |       0.9020  1.0   37585 |    ****     |    **** 

multicellular        |       0.8298  0.9   39286 |    ****     |    incl.

prokaryotes          |       0.7935  1.0   40240 |    ****     |    incl.
  prokaryotic        |      0.9x     1.0   42101 |    ****     |    incl.

Friedmann            |       0.7403  0.7   41786 |    ****     |    **** 
  Friedman           |     11.1x     0.8    9735 |    incl.    |    incl.
  freedman           |      3.9x     0.8   19235 |    incl.    |    incl.

neocortex            |       0.7276  0.8   42196 |    ****     |    **** 

bentonite            |       0.6716  0.6   44057 |    ****     |    incl.

clade                |       0.6623  1.9   44370 |    ****     |    **** 
  Claude             |     13.5x     0.9    9194 |    incl.    |    incl.
  clad               |     10.9x     1.4   10603 |    incl.    |    incl.
  Clyde              |      6.2x     0.9   15318 |    incl.    |    incl.
  Cade               |      1.8x     1.1   32080 |    ****     |    incl.
  clawed             |      1.6x     1.2   34277 |    incl.    |    incl.

greenstone           |       0.6124  0.6   46287 |    ****     |    incl.

paleo                |       0.5988  0.7   46830 |    ****     |    **** 
  Paulo              |      8.8x     0.7   13070 |    ****     |    *****
  Palo               |      5.2x     0.6   18374 |    ****     |    **** 
  Paolo              |      4.8x     1.1   19257 |    ****     |    **** 
  Pali               |      2.6x     0.8   27839 |    ****     |    incl.
  Paley              |      2.0x     0.9   31814 |    incl.    |    incl.
biljir commented 5 years ago

I will take a closer look at this tomorrow, but I have immediate comments on two words.

Google Books Ngram Viewer shows that "space-time" is used three times as often as "spacetime". This is one of those words that is suggested over and over. (In fact, I see I made this comment once already for this issue.)

I think "paleo" may be used more frequently than the table implies, because there's something called the "paleo diet" that has received a lot of attention in the past couple of years. (As I understand it, it's basically eating like a caveperson.)

kevina commented 5 years ago

@biljir thanks. I won't include spacetime. I made that list by simply copy and pasting the list into http://app.aspell.net/lookup-freq and removing the ones already marked as incl.

I will include paleo.

I hold off on making a release until you get a chance to comment.

kevina commented 5 years ago

@biljir Actually I am unsure about spacetime. It seams pretty common now. Again my apologizes if you already commented on any others I listed.

biljir commented 5 years ago

I also checked the big 3 American collegiate dictionaries. None of them listed spacetime as an alternative.

Spacetime is also not listed in the latest British Scrabble dictionary (which includes the American Scrabble dictionary as a subset). Actually, it's the 2015 edition - there could have been a new one since then.

It is a technical word, and my impression is that technical usage (rather than science-fictional hand-waving) is mostly hyphenated.

Of course, things might be different in Canada.

kevina commented 5 years ago

Wikipedia uses spacetime https://en.wikipedia.org/wiki/Spacetime

Interestingly PBS Space Time, uses "Spacetime" when referring to the word, but not in the title: https://www.youtube.com/watch?v=AwhKZ3fd9JA

biljir commented 5 years ago

Wikipedia is an interesting data point. I checked Wiktionary as well, and it considers space-time a variant of spacetime, and not the other way around. Not surprising that the two would agree, of course.

biljir commented 4 years ago

I have grave doubts about "proto" as a stand-alone word.

I've never (that I can recall) seen "compressive" in print. That's not by itself a reason to reject it, but given how common "compress" is, I'm guessing this derived word must be pretty uncommon.

I checked a few dictionaries (not an exhaustive search) and they say that "vulcanism" is an alternate form of "volcanism". Which is interesting, because "vulcanism" is the form I was familiar with. "volcanism" clearly wins the Google n-grams battle. As I recall, you have a separate list for common variants. In which case I recommend taking "volcanism" and putting "vulcanism" on the alternate list.

Friedmann is an interesting problem. In general, I would be in favor of listing all reasonable variants of people's names. Even though it appears that Friedmann is a much less common name than Friedman (based on a fairly small amount of research), that doesn't mean that the individual you're writing about isn't a Friedmann. If someone enters Freidman as a spelling and gets both forms as suggestions, I'd say that should be a hint to look up which one is correct rather than guessing one or the other. For whatever it's worth, in the WIkipedia page on Friedman(n), I was familiar with 2 -man's (one extremely familiar) and no -manns. Also, just as a by-the-way, there are 9 common variants of the spellings of my first+last names, and at least three uncommon ones. I have people who have been on a first-name basis with me for 20 years who still can't spell me right! (And pity the Smythes!)

I have seen "clade" in print often enough I tend to feel it should be included. I think that only "clad" of the list of similar words might be confused with it.

biljir commented 4 years ago

One more note on "spacetime" vs. "space-time". I had lunch today with a friend who is a professional technical writer. I asked him whether "space-time" or "spacetime" was preferred, and got a surprising answer. He said that, according to the Manual of Style that is mandated by his job, "spacetime" is preferred when used as a noun, and "space-time" when used as an adjective. Since the title of a Wikipedia article is generally a noun, that explains the Wikipedia usage. It also supports the idea that "spacetime" should be present in a spelling dictionary (as well as "space-time", if hyphenated words are supported).

kevina commented 4 years ago

Friedmann is an interesting problem. In general, I would be in favor of listing all reasonable variants of people's names.

I try to limit names to more common ones and not include spellings of less common one. Friedmann is an exception to this rule as Alexander Friedmann is a famous person and his name is part of "Friedmann equations".

kevina commented 4 years ago

One more note on "spacetime" vs. "space-time".

Okay. I think I will include spacetime then. Thanks for the followup.

biljir commented 4 years ago

Alexander Friedmann may indeed be famous, though I have not heard of him, at least not enough to penetrate into my memory. However, Kinky Friedman is a god, even though his time of greatness has clearly passed.

ghost commented 4 years ago
kevina commented 4 years ago

@biljir (from your earlier comment) I am going to skip shales as it is to close to sales.

I am also reluctantly including clade.

Neither planetesimal or planetesimals is already included so I will pass on those.

I am including most of the other **** words from the list above.