DataS-DHSC / consultations

Re-usable functions for describing the responses to a public consultation or call for evidence.
GNU General Public License v3.0
12 stars 0 forks source link

For free text: bigram correlations #5

Open MHWauben opened 3 years ago

MHWauben commented 3 years ago

To give an initial overview of the text, and to detect commonly co-occurring words, visualise a graph showing which words correlate together.

This may be used to detect common multi-word terms that have particular meaning in that domain (eg. the name of a department, or acronyms spelled out). These multi-word terms can then be handled in early text data cleaning stages.

MHWauben commented 3 years ago

In R, we could start with this approach: https://www.tidytextmining.com/ngrams.html