OpenTreeOfLife / germinator

miscellaneous scripts and data for concerns that span more than one of the Open Tree code repositories: integration tests, system statistics, etc.
BSD 2-Clause "Simplified" License
21 stars 7 forks source link

Feedback issues could contain markup for machine readable tests #150

Open jar398 opened 4 years ago

jar398 commented 4 years ago

Just some idle thoughts after pondering @hyanwong's avalanche of feedback issues:

I've been thinking about this kind of thing for a while.

If a feedback issue says something is wrong in the tree, it would be nice if the submitter could provide a concise statement that is not true now but would become true if the complaint were addressed.

E.g. if the issue says taxon A is misplaced, the markup could be a triple ((A,B),C) - "A should be closer to B than it is to C". Similarly for other highly structurable issues, e.g. false positive / false negative extinctness or synonymies and so on.

For triage one could simply sample all submitted issues so far to see which categories are most highly represented.

I don't know if we're talking about some kind of UI - that seems like overkill - or just special markup that users can manually enter in the issues (or their replies), that a script could then pick up by scanning the issue text (a scan of all issues - and maybe issue comments? - would not be that hard to do - look for regular expressions). And they could be made actionable, either when the taxonomy is constructed or when the big tree is constructed or (ideally) both. Maybe a bot could even submit new issue comments when an issue changes state from broken to fixed or vice versa.

jimallman commented 4 years ago

I really like this idea. As you point out, any testable assertions need to be in a form that is easy to use and easy to "proofread", so that we avoid miscommunication. Hmmm.

hyanwong commented 4 years ago

Yes! This would be a great way to implement testing of known problems. Special markup seems a good plan. It would be worth looking through the taxonomy issues to see what sort of issue tend to be reported, and therefore what assertion syntax would work.

It would also be useful to know if the issue is due e.g. to a particular upstream provider. I'm not sure how we could tell this, though.