cu-mkp / sandbox

The “Sandbox” space makes available a number of resources that utilize and explore the data underlying "Secrets of Craft and Nature in Renaissance France. A Digital Critical Edition and English Translation of BnF Ms. Fr. 640" created by the Making and Knowing Project at Columbia University.
https://cu-mkp.github.io/sandbox/
6 stars 1 forks source link

make tutorial: data from voyant - word clouds #28

Open njr2128 opened 3 years ago

njr2128 commented 3 years ago

From Terry:

Two wordclouds generated from voyant correlation: image

Exported that data, cleaned it up and then used an external wordcloud generator to create these:

Women

women_context_terms

Horse

horse_context_terms

njr2128 commented 3 years ago

To export data and clean it up:

step 0: create corpus around word of interest

E.g., wom* (to capture woman, women, etc.) image

step 1: export from voyant

Hover over and click on the "export" button that only appears when hovering (cannot capture with screenshot) image

step 2: export current data as tab-delimited

image

step 3: copy and paste into Atom (or other text editor)

image image

step 4: in Atom, remove first line (voyant header) and numbers (either manually or with regex)

Do a find+replace with ctrl+f and choosing regex (see arrow) Use the expression ^\s[0-9]+ to only find the numbers at the beginning of the line (applied by voyant) so that if there are any numbers in the actual corpus itself they are not also removed. ^ is beg of line \s is spaces

image

step 5: save the result of "women vocabulary" as .txt (using "save as"):

women-vocabulary.txt

image

step 6: use free online word cloud generator to create wordcloud

E.g., https://www.wordclouds.com/ women-vocabulary

njr2128 commented 3 years ago

create .md from last comment and mount as mini tutorial in sandbox

njr2128 commented 3 years ago

And also include embeddable Voyant tools

njr2128 commented 3 years ago

Created a dataset without "amp" (cleaned up the holdovers from markup &) women-vocabulary_no-amp.txt New wordcloud: wordcloud-women