ReproNim / testkraken

Generalized regression testing of scientific workflows
3 stars 9 forks source link

extending test spec to fit AFNI use cases #71

Open djarecka opened 5 years ago

djarecka commented 5 years ago

The summery of the discussion with @leej3 about the spec that would be more useful for afni testing: https://docs.google.com/document/d/17DHlnNzKl6rAhC-NlLRi0IKAbUodR3wtATcOXvwvmiU/edit?usp=sharing

please commend here or in the doc

leej3 commented 5 years ago

The AFNI team wants to develop their tests in tcsh. I have attempted to create a specification that would allow overlap with testkraken.

We should discuss...

https://docs.google.com/document/d/13P4S6ZhF6K0Ho7wTqeKXHp-hged8EXTbElFQthW3Fg0/edit?usp=sharing

satra commented 5 years ago

@leej3 - seems like coming up with a set of working examples prior to november hack may be a good target and then to discuss with the broader afni crowd at the hack.

leej3 commented 5 years ago

When you say working examples do you mean a selection of yaml files executed in test-kraken?

Also, I'm wondering can we tentatively agree on a schema: even if its a straw-man I'd like to target that as I fill in the various examples that span our usage needs (or indeed do you already have a schema/object structure in mind). Specific details that I was wondering about yaml specification for testkraken:

satra commented 5 years ago

do you mean a selection of yaml files executed in test-kraken?

yes or some shim between yaml and testkraken

can we tentatively agree on a schema: even if its a straw-man

sure - go for it. whatever you and @djarecka find reasonable.

how might one specify dependencies between tests

this would be closer to a dataflow framework that specifies how outputs from a prior test goes into a subsequent test. this would be equivalent to a workflow specification. one possibility is to consider CWL. the other is simply to say how do i specify dependencies.

the best way to specify environment variables

i would use that as part of the environment specification (software/libraries + environment variables). for now i think we are using the neurodocker spec, which does have a way to add environment variables. now, if these are specific to the test side rather than the container side, we can override or add environment variables on the test side.

You mentioned a tree in the spec doc. Would this be a tree object to represent collections of tests within Python. Would this facilitate inheriting environments/variables etc from closer to the root of the tree?

we can use the inheritance principle when feasible, but it could be that all tests run in all environments unless otherwise restricted (just like we restrict certain versions of libraries in python setuptools)

djarecka commented 5 years ago

I will come back to this tomorrow, but have some questions

leej3 commented 5 years ago

Ok, sounds like a good goal.

this would be closer to a dataflow framework

Yes. I'm happy to explore the use of pydra for this. A lot of our tests will take a while to run. I think specifying dependencies between tests and skipping tests as dependencies fail would be very useful. It will require a cost benefit analysis though. Having some clear examples of this being used will aid discussion. Overall, it would be nice to create a list at some point (perhaps at the hack) of the advantages of such an approach over:

sounds good regarding environment variables and tree inheritance.

Is it necessary to have scientific_workflow-> test1 -> test2?

I think it might just be a terminology thing. We can explore what terms we want to use. Using your terms I think it would be more along the lines of

scientific_workflow_1 -> scientific_workflow_2

And scientific_workflow_2 would only run if the former succeeded (with success possibly assessed by some extra commands run to check the output, not just the exit status).