Closed drjwbaker closed 4 years ago
The collocates of a word are those words that tend to occur in proximity to that word more than they occur in proximity to all other words in the corpus general.
remove 'general'.
Antconc then presents a returns a slightly confusing screen. It contains the following information:
removes 'a returns'.
5509 of the 24292 unique words
Numbers wrong
5509 of the 24292 unique words (or about 1 in 5) occur at least once in proximity to the word “behind’.
3277 first two or more times.
Some words have jumped up the list (“them” from 17 to 5, “just” from 65 to 17, “immediately” from 81 to 19);
Fix numbers on these.
There is one word in the top 15 words (“a”) that has a negative stat value.
Some have 0. And 65 'with' has negative.
(George tends to prefere “posterior”).
typo. prefer.
“Behind are constables with staves”, “Behind are flames”, “close behind are eight other judges”, etc,
We are also reminded of the value of retaining capitalization (look at how common it is to see punctuation before “Behind”.
missing a bracket
Note that the values around 0.5 and below (and even in negative!) are words like “a”, “and”, “left”: words that we know are
More like 1.5. but check.
All the top 50 ranked works are now those that occur only once 1L/1R of “behind”,
Change to 'Most of the'
Proper names (“Canning”, “Castlereagh”, “Napoleon”, “Wellington”) are frequent, tend to occur in proximity to spatial words like “behind”, and tend to appear before the word “behind”
After the word. Also, 'north', 'fox', 'wellington', and 'napoleon'
start by searching to the Collocates tab
'in the'
T2:
hisorically specific cataloguing choices
typo
in the corpus general
just corpus
Note: I realised the issue here was that during the run-through, we didn't change the 'collocates' settings to Treat all data as lowercase
. It works properly now :)
Timings wrong. 10 + 10.