Open andrewsu opened 3 years ago
Most of the pairs (99%) that fall into the Red category is because the csv table doesn't provide both GARD ID and UNII ID, so we can not make the query. That's why it's showing failed.
GitHub Actions would be hard. Since each of the job will take probably 2hrs to run. And GitHub actions has a limit on execution time for each job. We have to try Jenkins for that. Or run a local instance of GitHub Actions.
just a note that if we move this repo to one of our other organizations (or create a new organization), we can get 3000 Action minutes / month for free given our educational discount...
Great. Let's move it to the BioThings Org then. I will get the github actions set up right way. We can do it as a cron job on a. weekly basis.
This is my crude analysis of
results.csv
(which I attempted and failed to incorporate into the jupyter notebook):In terms of regression testing, I think the cells highlighted in red and green are the ones we want to track. Obviously the numbers in green will improve based on (at least):
src/query_templates/
)And I assume reducing the numbers in red will involve a more detailed analysis of why certain queries failed.
After we get these metrics computed in a notebook, it would be great to track them over time (perhaps using Github Actions or Jenkins?)...