globalbiodata / inventory_2022

Public repository for the biodata resource inventory performed in 2022.
MIT License
10 stars 4 forks source link

Query epmc #13

Closed schackartk closed 2 years ago

schackartk commented 2 years ago

Overview

This merge adds a script for running the initial EuropePMC query (and for running updated queries), and adds notebooks for running the pipeline to reproduce original results, and for running an updated query.

query_epmc.py

This new script queries EuropePMC given a query, and publication date range. This is used to reproduce the original results, and to perform updated queries. As output it produces a CSV file of the paper IDs, titles, and abstracts. It also outputs a text file of the date that the query was run, which is useful for running updates in the future, as this can be supplied as the beggining of the new date range.

This step is not integrated into the pipelines yet.

running_pipeline.ipynb

This is a Jupyter notebook meant to be run on Google Colab. It is not necessasry, but meant to make reproducing results easier. It mount Google Drive, and runs the full reproduction pipeline.

updating_inventory.ipynb

This is a Jupyter notebook for periodically updating the inventory. It is not functional yet, but gives an outline of how that process will look.