UHaifa-IS / whgazetteer-mehdie

World Historical Gazetteer - MEHDIE version
http://whgazetteer.org
BSD 3-Clause "New" or "Revised" License
1 stars 1 forks source link

Recommendation for Hebrew , German and Arabic person name fields for template and parsing #185

Open sinairusinek opened 6 months ago

sinairusinek commented 4 months ago

Linked People Delimited v0.4 (LP-Delimited)

Inspired by Linked Places Delimited v0.4 (LP-Delimited), this format aims to structure data on people for person matching tasks on the MEHDIE platform.

A comment about languages: Any textual field below can have a language specification attached to it: e.g. @de, @yi If not available, Mehdie will use as default Hebrew for Hebrew script, Arabic for Arabic script and English for Latin script.

Fields

## required ##

id Contributor's internal identifier. This must stay consistent throughout accessioning workflow, including subsequent updates

name Any name can come here, as appears in the sources. Multiple names will be separated by semi-colon

title_source

Label or short citation for source of the title toponym, in any style; e.g. 'Efraim Lev : Jewish Medical Practitioners in the Medieval Muslim World; A Collective Biography.;

## encouraged ##

nami_uri Permalink URI for the source of the toponym, if available.

primary-name e.g. "Ibn-Jumai", "Maimonides" A name that does not necessarily conform to other fields but is a common way of referring to a person. given-name e.g. "Moses" middle-name e.g. "Hillel" in the expression "Haim Hillel Ben-Sasson" maiden-surname e.g. Goldstein in the expression "Sara Karp (nee Goldstein)" or "Sara Karp (geb. Goldstein)" surname e.g. Shalit patronymic e.g. Ben-Moshe, Ibn-Musa. In case of avonymics a patronym can also include names of forefathers: e.g. "ibn Ḥasan ibn Ifrāʾīm ibn Yaʿqūb ibn Ismāʿīl ibn Jumayʿ in the case of https://usaybia.net/person/1017 acronym e.g. "רמב״ם" teknonym e.g. "Abu-Ibrahim" professional-name e.g. Al-Tabib. Use only when the name is used as an actual designation of the profession of the individual, and not when it is a surname (even if it might refer to the profession in previous generations). nisba e.g. al-Isrāʾīlī in https://usaybia.net/person/1017 alternative-name e.g. Ibn Jamīʿ in https://usaybia.net/person/1017. Include here also epithets such as "Al-Shaykh al-Muwaffaq Shams al-Riyāsa" qualifier e.g. "jr." or "the third". appellation E.g. "Lord", "Ha-Cohen". Use only when the name is used as a designation of an actual status, (not when 'Hacohen' is merely a surname). sallutation e.g. Dr., Prof., Bei.

attestation_date Use the date of the source from which the name is taken. birth_date death_date All of these must be written in ISO 8601 form (YYYY-MM-DD), omitting month and/or day where appropriate. BCE years must be written as a negative integer, e.g. -320 for 320 BCE. fl Expression the century or century span for when the person flourished.

matches One or more URIs for matching record(s) in person name authority resources. E.g: https://usaybia.net/person/1202 Interpreted as SKOS:closeMatch, which is "used to link two concepts that are sufficiently similar that they can be used interchangeably in some information retrieval applications" and is inclusive its sub-property SKOS:exactMatch. semicolon-delimited.

description

A short text description of the person


sinairusinek commented 4 months ago

Notes for the algorithm: Matching should be done as a calculation of separate matches between the elements. The same methods we use for the matching of place names (transliteration, fuzzy, etc.) can be applied between the same literal fields, and a different threshold should be set for the name and the alternative name fields.

At the next stages: Using an external service to parse the name and alternative name fields Calculating proximity of date information fields

sinairusinek commented 4 months ago

Ideally, each field can include multiple values separated by ; but I assume this can impact computation, so I am leaving this open for now.

sinairusinek commented 4 months ago

[samples] (https://docs.google.com/spreadsheets/d/10rvo4x96gNkL_19xrtjfxKIMSYUW8-nXkmQ3PukswGY/edit?usp=sharing ) (In the first tab)-just a few samples to exemplify the values in each possible column.

tomersagi commented 4 months ago