clevercanary / data-browser

Apache License 2.0
2 stars 0 forks source link

Update NCPI's list of studies programmatically #649

Closed NoopDog closed 1 year ago

NoopDog commented 1 year ago

The NCPI Portal uses a list of study ids for each platform at dashboard-source-ncpi.tsv. This list was imported manually from the anvilproject.org repository utils directory where it is updated each time the application is started.

We can update for CRDC, BDC, AnVIL and KF in different ways.

For this ticket we want to create a script in the files/ncpi-catalog directory that:

  1. Reads in the dashboard-sorce-ncpi-tsv file.
  2. Calls the CRDC or other APIs to retrieve the list of studies for that platform.
  3. Add update the list of CRDC studies in the dashboard-sorce-ncpi-tsv file to mach the new studies.

There is javscript code in the anvil portal repo that can be used as an example of how to read and parse the CRDC API. This code will need to be converted to typescript. Note that we currently only list the CRDC studies that have a dbGAP phsId associated with them. The js code should show how this filtering is done.

Getting Started / Background

We now have a /files directory that contains scripts for preparing the data for the the static explorer instances. These are the ones that do not use a back end server and instead do all of the filtering on the client (e.g. ncpi-catalog, anvil-catallog). The scripts in these directories run before the application is run or built and generate files in the "out" directory under each project name.

For example to run the anvil-catalog browser locally will need to:

  1. Make sure you have node 16.15.1 configured as the current nodejs
  2. Pull the latest from main.
  3. Navigate into explorer/files/ and do an npm ci
  4. Manually create an out directory under explorer/files/ ncpi-catalog e.g. explorer/files/ ncpi-catalog/out
  5. From explorer/files run npm run build-ncpi-db to expand the dashboard-source-ncpi.tsv into two files ncpi-plaftorm-studies.json and ncpi-platforms.json.
  6. From /explore run npm run dev:ncpi-catalog -this should read in the json files and make the application available on localhost:3000/explore.

Note that we will also make scripts to update the other 3 platform's studies but for now lets focus on getting the CRDC one going.

BDC

BDC can be done in a very similar manner to CRDC. The AnVILProject portal code has examples in the same files as CRDC for the BDC API.

AnVIL

The AnVIL studies list is in files/anvil-catalog/out/anvil-studies.json This can be read in and we can use any study that has a dbGapId. The anvil-studies.json will be created when npm run build-anvil-db is run from the /files directory.

KF

KF has an API but this API call has a manual authentication step. @jpaten you manually did this once a while back. You need to manually call https://kf-api-fhir-service.kidsfirstdrc.org/ResearchStudy?_total=accurate then login, then download the page to a file (in out so it wont get checked into github). Then have the script import import the studies from the KF download.

Definition of Done

  1. We have a new typescript script in the package.json under /files that can be run to update the dashboard-source-ncpi-tsv file with new items from CRDC.
  2. We have a new typescript script in the package.json under /files that can be run to update the dashboard-source-ncpi-tsv file with new items from BDC.
  3. We have a new typescript script in the package.json under /files that can be run to update the dashboard-source-ncpi-tsv file with new items from AnVIL.
  4. We have a new typescript script in the package.json under /files that can be run to update the dashboard-source-ncpi-tsv file with new items from KF.
  5. The script prints out the dbGap ids of any new studies and the count of new studies if any.
  6. When we then run build-ncpi-db this uses any new studies from the CRDC API.
NoopDog commented 1 year ago

@jpaten please add BDC, AnVIL and KF scripts to this same ticket. The over view above will be updated to add additional info for these tasks.