cwrc / ontology

CWRC ontology - primary repository
13 stars 7 forks source link

Scrape of RS tag: REG values (with counts) #421

Open SusanBrown opened 6 years ago

SusanBrown commented 6 years ago

Found in bio docs, writing docs, and events

alliyya commented 6 years ago

Waiting for new data dump from susan

alliyya commented 6 years ago

There wasn't many useful REG values, as they seemed to in a few instances misuse REG as TYPE instead. Resulting in some specific ships being reg'd as just SHIP Grabbed Freeform values instead.

Can be found in this sheet https://docs.google.com/spreadsheets/d/1qnHCJrv2Ld0EOE5HTQbTLqYBjsq3KW2PZCeaa-HDQao/edit#gid=0

Scraped from Biography, Writing, Freestanding documents from last data dump.

Created a sheet for combined and each individual set of documents (Biography, Writing, Freestanding)

Only about 194 unique instances

lemaka commented 6 years ago

tentative decision: pick out famous ships, leave the rest as strings. (identify a few -- the ones that occur more than once-- and type the rest) -- consolidate terms where possible: eg. yawl, steamer: could both be ships.

lemaka commented 6 years ago

@lemaka pass on to Thomas next Tuesday

alliyya commented 5 years ago

Note: We're holding off on this till Writing extraction is complete.