Language-Research-Technology / corpus-tools-cooee

Corpus prep tools for the COEEE corpus (using the spreadsheet that comes with the corpus)
GNU General Public License v3.0
0 stars 1 forks source link

Capture time periods from filenames #11

Open mraadgev opened 1 month ago

mraadgev commented 1 month ago

Collection is split by time periods based on file-naming conventions

capture this data and include in the metadata.

mraadgev commented 1 month ago

Time period - The corpus is divided into four time periods: Period 1: 1788-1825 Period 2: 1826-1850 Period 3: 1851-1875 Period 4: 1876-1900 The initial numeral of each file name indicates the period from which the document comes.