cu-mkp / m-k-manuscript-data

Text of BnF Ms Fr 640 in multiple formats, metadata about the manuscript, and derived data
14 stars 5 forks source link

Search campaign for alternate spellings, variants, lemmatizations, etc and their frequencies #709

Closed thuchacz closed 3 years ago

thuchacz commented 5 years ago

Desired task in the summer:

Matthew and CAG undertake a limited (3-day max.?) campaign to programmatically look for the alternate spellings for glossary terms. NB: They should also assess the relative frequency of the various spellings so that they be listed in the glossary from most frequent (as headword) to least frequent.

njr2128 commented 5 years ago

For reference: E-Leo database http://www.leonardodigitale.com/

njr2128 commented 5 years ago

First step: @tcatapano, pull one alphabetized list of all words used from both tc and tcn

njr2128 commented 5 years ago

First step: @tcatapano, pull one alphabetized list of all words used from both tc and tcn

https://github.com/cu-mkp/m-k-manuscript-data/blob/master/qc/tcn_words_srt.txt (used allFolios)

tcatapano commented 5 years ago

@njr2128 Is this issue still active? Can I demote/remove from my board?

njr2128 commented 3 years ago

Closing as this is now out of date and if any work similar to this is desired, it should be opened in a new issue