eljeffeg / SmartCopy

Chrome extension for copying genealogical data into Geni.com.
15 stars 12 forks source link

Adding support for additional sites: Open Archives & Genealogy Online #94

Open coret opened 3 years ago

coret commented 3 years ago

I'd like to recommend two sites to be included into SmartCopy: Open Archives (Dutch and Belgian archive sources, 270M profiles) and Genealogy Online (online trees, 60M profiles).

Both websites are multilingual and use microdata to semantically tag information. For Open Archives there's a good documented API too.

If there's more info on how to add a site to SmartCopy, I (founder of both sites) am willing too help.

eljeffeg commented 3 years ago

Nice, I'll take a look when I have a moment. I haven't created any docs on how to add additional sites - it just sort of grew and I haven't gone back to make it dev friendly.

Regarding your site, funny coincidence, I just saw one of your records today in a MyHeritage match, unfortunately, the links were dead. It was for Marriage record for Charlotte Lacasse, Jan 21 1800  St-Charles De Bellechasse, Qc

Here are the links: https://www.genealogieonline.nl/en/les-celtes-base-1/I2668634.php https://www.genealogieonline.nl/en/les-celtes-base-1/I2101343.php

coret commented 3 years ago

@jeffg2k, Genealogy Online is a platform where genealogist can publish their genealogical data (and images). And genealogists also have the possibility to unpublish their data. The list of removed publications is available to MyHeritage too (as it it for Ancestry), but possibly they don't update their matches that often.

coret commented 3 years ago

Are there sources which use microdata and are included in SmartCopy? Those can form the basis of a collection .js file for Open Archives (record with archival data) and Genealogy Online (family trees) as the both websites use microdata. This means less HTML parsing/regexping as the data is semantically marked up.

To easily see the semanticly enriched data of an Open Archives page (as Google does) see https://search.google.com/structured-data/testing-tool/u/0/?hl=nl#url=https%3A%2F%2Fwww.openarch.nl%2Felo%3Abada0d37-3a2d-02ca-cf36-446ed4359927%2Fen (click on http://historical-data.org/HistoricalRecord in the right pane).

The URL structure of Open Archives is: https://www.openarch.nl/{record collection}:{record guid}{optional language: /en|/de/fr}

And the same semantic insight of a Genealogy Online page see https://search.google.com/structured-data/testing-tool/u/0/?hl=nl#url=https%3A%2F%2Fwww.genealogieonline.nl%2Fen%2Fkwartierstaat-hans-flipse%2FI2263.php (click on https://schema.org/Person in the right pane).

The URL structure of Genealogy Online: https://www.genealogieonline.nl/{optional language: en/|de/|fr/}{publication_uri}/{person_uri}