BojarLab / glycowork

Package for processing and analyzing glycans and their role in biology.
https://Bojarlab.github.io/glycowork
MIT License
56 stars 11 forks source link

Add tests for standardized semantics #19

Closed cthoyt closed 2 years ago

cthoyt commented 2 years ago

As I noted on Twitter, it looks like the semantics of the tissue_id (and probably some other columns) are not standardized. More precisely, it appears that some entries are written using the compact URI (CURIE) syntax in the form <prefix>:<local unique identifier> and others are written in the form of just <local unique identifier>. This PR does the following:

  1. Adds a test to check that all of the entries in this column are valid, based on the Bioregistry (https://github.com/biopragmatics/bioregistry; a resource that keeps track of all of the prefixes and patterns for different namespaces)
  2. Adds a tox.ini file so these tests can be run reproducibly
  3. Updates the GitHub Action to run the tests via tox

To-Do

Bribak commented 2 years ago

Thanks so much! (both for your time & your interest) Just sanity-checking: did you run nbdev_install_git_hooks after getting the repo? We have it, admittedly a bit hidden, described in the contributing.md document but it will cause endless CI issues with nbdev if I merge this without the git hooks.

On the dev branch we by the way have now fixed the specific issue you mentioned (and I'll merge it into master later as well) but it would be super-valuable to have this in our checks

cthoyt commented 2 years ago

It just occurred to me that your tests are hidden inside the package itself at https://github.com/BojarLab/glycowork/tree/master/glycowork/tests. There are a few reasons why this is bad outlined in this excellent blog post https://hynek.me/articles/testing-packaging. So I didn't try running the hooks because I actually had assumed there just weren't any tests 🙈 sorry! I'll give that a shot later

Bribak commented 2 years ago

The tests inside the package are a bit deprecated (which is bad in itself) and so are not currently used (#futurework). But we do have tests for the documentation notebooks from nbdev, that's where the hooks come into play:-)