ContentMine / journal-scrapers

Journal scraper definitions for the ContentMine framework
66 stars 33 forks source link

Extract authors as split firstname, lastname, others where possible #11

Open blahah opened 10 years ago

blahah commented 10 years ago

If the source provides access to structured author names, include them

"author": {
  "name": "Bob G Dylan",
  "firstname": "Bob",
  "lastname": "Dylan",
  "others": "G",
  "email": "bobby.g.dylan@fake.com",
  "institution": ""
}

cc @mitar

mitar commented 10 years ago

The main point is that scraper should extract data in the most raw form available from the source and not try to concatenate/combine/process data. This should be done at later stages (where you might have more information available, like all known authors in the database against which you could cross-check).