front-matter / commonmeta

MIT License
5 stars 0 forks source link

Crossref Published dates represented inconsitently across work types #6

Open gbilder opened 1 month ago

gbilder commented 1 month ago

Published date parsing seems incocnistent across different Crossref content types:

For example, JournalArticles always return the Published date as a YYYY-MM-DD string.

However other contents seem to also include trailing time info. Here are examples:

In each case the published date seems to include trailing time info as well.

For example, with the last item in the above list:

commonmeta convert --from crossref "10.1109/vlsic.2004.1346548"

results in:

{
  "id": "https://doi.org/10.1109/vlsic.2004.1346548",
  "type": "ProceedingsArticle",
  "container": {
    "type": "Proceedings",
    "title": "2004 Symposium on VLSI Circuits. Digest of Technical Papers (IEEE Cat. No.04CH37525)"
  },
  "contributors": [
    {
      "type": "Person",
      "givenName": "J.W.",
      "familyName": "Lee",
      "contributorRoles": [
        "Author"
      ]
    },
    {
      "type": "Person",
      "familyName": "Daihyun Lim",
      "contributorRoles": [
        "Author"
      ]
    },
    {
      "type": "Person",
      "givenName": "B.",
      "familyName": "Gassend",
      "contributorRoles": [
        "Author"
      ]
    },
    {
      "type": "Person",
      "givenName": "G.E.",
      "familyName": "Suh",
      "contributorRoles": [
        "Author"
      ]
    },
    {
      "type": "Person",
      "givenName": "M.",
      "familyName": "van Dijk",
      "contributorRoles": [
        "Author"
      ]
    },
    {
      "type": "Person",
      "givenName": "S.",
      "familyName": "Devadas",
      "contributorRoles": [
        "Author"
      ]
    }
  ],
  "date": {
    "published": "2004-10-26T09:47:24Z"
  },
  "identifiers": [
    {
      "identifier": "https://doi.org/10.1109/vlsic.2004.1346548",
      "identifierType": "DOI"
    }
  ],
  "license": {},
  "provider": "Crossref",
  "publisher": {
    "id": "https://api.crossref.org/members/263",
    "name": "Widerkehr and Associates"
  },
  "references": [
    {
      "key": "ref4",
      "id": "https://doi.org/10.1145/586110.586132"
    },
    {
      "key": "ref3",
      "id": "https://doi.org/10.1109/9780470544365"
    },
    {
      "key": "ref6",
      "title": "IC Identification Circuit Using Device Mismatch",
      "publicationYear": "2000"
    },
    {
      "key": "ref5",
      "id": "https://doi.org/10.1109/csac.2002.1176287"
    },
    {
      "key": "ref2",
      "title": "Identification and Authentication of Integrated Circuits",
      "publicationYear": "2003"
    },
    {
      "key": "ref1",
      "publicationYear": "2001"
    }
  ],
  "titles": [
    {
      "title": "A technique to build a secret key in integrated circuits for identification and authentication applications"
    }
  ],
  "url": "http://ieeexplore.ieee.org/document/1346548/"
}

What is very odd- is that the date in the JSON above ("2004-10-26T09:47:24Z") seem to be extracted from the :

2004-10-26T09:47:24Z

Which is the date the record was createed, not the date the item was published.

http://api.crossref.org/works/10.1109/vlsic.2004.1346548/transform/application/vnd.crossref.unixsd+xml

gbilder commented 1 month ago

The issue is simply one of granularity—some dates (e.g., "published") can be partial, and the original metadata doesn't include a time component. However, it would be good if all dates (partial or otherwise) were represented consistently, even in string form. So, for example: