callahantiff / PheKnowLator

PheKnowLator: Heterogeneous Biomedical Knowledge Graphs and Benchmarks Constructed Under Alternative Semantic Models
https://github.com/callahantiff/PheKnowLator/wiki
Apache License 2.0
157 stars 29 forks source link

CI/CD Pipeline: Ensuring Builds Use Most Current Data #90

Open callahantiff opened 3 years ago

callahantiff commented 3 years ago

TASK

Currently, the build downloads are via the builds/data_to_download.txt, which is a list of URLs. While this will work for 90% of the existing data used, there are a few data provides that include explicit versions in the URLs. As of now, this means that unless we update this text file we will not be guaranteed to get the most current data. Additionally, some of the downloads rely on running a query against a data provider's API. This should always result in the most up-to-date data, but we should verify this also.

The following resources include explicit versions in the URLs and will need updates to resolve the aforementioned problem:

The following resources are generated from querying an API:



TODO

cthoyt commented 3 years ago

check out the bioversions project, I'm working on similar stuff for solving this problem... unfortunately the state of versioned biomedical data is just as lacking as most other things 🤡

callahantiff commented 3 years ago

@cthoyt - brilliant, yes! Will definitely work on this for upcoming releases. Thanks for pointing this out!

cthoyt commented 3 years ago

@callahantiff please let me know if there are any resources you're using that aren't supported by bioversions already and I will add them. The syntax to get the current version for one is:

import bioversions
version_string = bioversions.get_version('resource name')