gbif / content-crawler

Crawls CMS and articles from Mendeley into ElasticSearch indexes
Apache License 2.0
1 stars 1 forks source link

Support derivedDatasets #33

Closed dnoesgaard closed 3 years ago

dnoesgaard commented 3 years ago

WARN [2021-01-21 08:57:45,586+0000] [main] org.gbif.content.crawl.mendeley.ElasticSearchIndexHandler: Document ID "41f5188f-274a-33cc-92de-b7a55232bb46" has a not-found DOI 10.15468/dd.sjqtxk

The crawler should be able to process citations of derivedDatasets (e.g. 10.15468/dd.sjqtxk) and decorate literature items with corresponding fields (pretty much same behaviour as for a download, I suppose?)

gbifDerivedDataset (or similar) gbifDatasetKey publishingOrganizationKey

I think that should do it?

MortenHofft commented 3 years ago

Sounds good to me. So to reiterate we have a derivedDatasetDoi in mendeley.

dnoesgaard commented 3 years ago

As we're starting to see derivedDataset records cited, it would be nice to get this implemented in the crawler, so these citations actually count.