tabulate table of key metrics to optimize

andrewsu commented 3 years ago

This is my crude analysis of results.csv (which I attempted and failed to incorporate into the jupyter notebook):

In terms of regression testing, I think the cells highlighted in red and green are the ones we want to track. Obviously the numbers in green will improve based on (at least):

expansion of content in existing data sources in BTE
addition of new data sources in BTE
addition of new explanatory metapaths (src/query_templates/)

And I assume reducing the numbers in red will involve a more detailed analysis of why certain queries failed.

After we get these metrics computed in a notebook, it would be great to track them over time (perhaps using Github Actions or Jenkins?)...

kevinxin90 commented 3 years ago

Most of the pairs (99%) that fall into the Red category is because the csv table doesn't provide both GARD ID and UNII ID, so we can not make the query. That's why it's showing failed.

kevinxin90 commented 3 years ago

GitHub Actions would be hard. Since each of the job will take probably 2hrs to run. And GitHub actions has a limit on execution time for each job. We have to try Jenkins for that. Or run a local instance of GitHub Actions.

andrewsu commented 3 years ago

just a note that if we move this repo to one of our other organizations (or create a new organization), we can get 3000 Action minutes / month for free given our educational discount...

kevinxin90 commented 3 years ago

Great. Let's move it to the BioThings Org then. I will get the github actions set up right way. We can do it as a cron job on a. weekly basis.

biothings / bte_regression_test

tabulate table of key metrics to optimize #5