MIT-LCP / physionet-build

The new PhysioNet platform.
https://physionet.org/
BSD 3-Clause "New" or "Revised" License
56 stars 20 forks source link

Indexing content in Google Dataset search #147

Closed tompollard closed 5 years ago

tompollard commented 6 years ago

Google have a new dataset search tool (thanks for the pointer Shawn). See: https://www.nature.com/articles/d41586-018-06201-x

We need to look into how to get our content indexed by the service. Searching for ECG gives hits on Kaggle, Zenodo, Figshare, etc, but nothing on PhysioNet as far as I can see. https://toolbox.google.com/datasetsearch/search?query=ecg&docid=6HQnrmAwaWmBclj4AAAAAA%3D%3D

tompollard commented 6 years ago

Some background on how to get content indexed is at: https://developers.google.com/search/docs/data-types/dataset

tompollard commented 5 years ago

Schema metadata is now embedded in project pages: view-source:https://physionet.org/content/eicu-crd/2.0/ and projects are indexed in Google Datasets: https://toolbox.google.com/datasetsearch (e.g. try searching for mimic-iii or eicu.

tompollard commented 4 years ago

Just dropping this here for reference:

Google Datasets have released a snapshot of their search index metadata on Kaggle (https://www.kaggle.com/googleai/dataset-search-metadata-for-datasets).

There is a starter notebook at: https://www.kaggle.com/kerneler/starter-dataset-search-metadata-for-e09382eb-3 that makes it straightforward to explore the dataset.

There are 156 entries harvested from PhysioNet Schema.org metadata (attached, generated using the notebook above), which should reflect the number of projects published on the census date (17 August 2020).

There is an associated blog post at: https://ai.googleblog.com/2020/08/an-analysis-of-online-datasets-using.html