CatalogueLegacies / antconc.github.io

Computational Analysis of Catalogue Data
https://cataloguelegacies.github.io/antconc.github.io/
Other
4 stars 1 forks source link

ep 7 edits #25

Closed drjwbaker closed 4 years ago

drjwbaker commented 4 years ago

Timings wrong. 10 + 10.

drjwbaker commented 4 years ago

The collocates of a word are those words that tend to occur in proximity to that word more than they occur in proximity to all other words in the corpus general.

remove 'general'.

drjwbaker commented 4 years ago

Antconc then presents a returns a slightly confusing screen. It contains the following information:

removes 'a returns'.

drjwbaker commented 4 years ago

5509 of the 24292 unique words

Numbers wrong Screenshot 2020-06-19 at 14 10 51

drjwbaker commented 4 years ago

5509 of the 24292 unique words (or about 1 in 5) occur at least once in proximity to the word “behind’.

3277 first two or more times.

drjwbaker commented 4 years ago

Some words have jumped up the list (“them” from 17 to 5, “just” from 65 to 17, “immediately” from 81 to 19);

Fix numbers on these.

drjwbaker commented 4 years ago

There is one word in the top 15 words (“a”) that has a negative stat value.

Some have 0. And 65 'with' has negative.

drjwbaker commented 4 years ago

(George tends to prefere “posterior”).

typo. prefer.

drjwbaker commented 4 years ago

“Behind are constables with staves”, “Behind are flames”, “close behind are eight other judges”, etc,

drjwbaker commented 4 years ago

We are also reminded of the value of retaining capitalization (look at how common it is to see punctuation before “Behind”.

missing a bracket

drjwbaker commented 4 years ago

Note that the values around 0.5 and below (and even in negative!) are words like “a”, “and”, “left”: words that we know are

More like 1.5. but check.

drjwbaker commented 4 years ago

All the top 50 ranked works are now those that occur only once 1L/1R of “behind”,

Change to 'Most of the'

drjwbaker commented 4 years ago

Proper names (“Canning”, “Castlereagh”, “Napoleon”, “Wellington”) are frequent, tend to occur in proximity to spatial words like “behind”, and tend to appear before the word “behind”

After the word. Also, 'north', 'fox', 'wellington', and 'napoleon'

drjwbaker commented 4 years ago

start by searching to the Collocates tab

'in the'

drjwbaker commented 4 years ago

T2:

drjwbaker commented 4 years ago

hisorically specific cataloguing choices

typo

drjwbaker commented 4 years ago

in the corpus general

just corpus

drjwbaker commented 4 years ago

Note: I realised the issue here was that during the run-through, we didn't change the 'collocates' settings to Treat all data as lowercase. It works properly now :)