Closed brianmcmichael closed 8 years ago
I think this might be a bug, as I thought the original intention of Status.completed? meant completed or failed.
Status methods are being generated dynamically here: https://github.com/AweSim-OSC/osc-machete/blob/release/1.0/lib/osc/machete/status.rb#L31-L45
Additional handling will be needed to make something like "completed?" behave the way we want it to.
I think #completed?
should mean the batch job has finished.
And we use #passed?
and #failed?
for whether the job results pass or fail the user's validations.
Begs the question, should we have a completed?
status value? Not having one would break all existing apps. Having one introduces problems...
Because we then want to treat completed
as two things:
passed
, failed
(a cached status value of completed
would be the simulation completed but has not been validated)completed?
to determine whether the status value is completed
, passed
, or failed
My thoughts:
completed
value distinct from failed
and passed
?done?
similar to active?
If completed
is a status value and completed?
returns true for both completed
and passed
then we have this problem:
s1 = Status.completed
s2 = Status.passed
s1.completed? # => true
s2.completed? # => true
s1 == s2 # => false
I guess this is okay:
s1 = Status.completed
s2 = Status.passed
s3 = Status.failed
s1.completed? # => true
s2.completed? # => true
s1.validated? # => false
s2.validated? # => true
s3.validated? # => true
# the idiosyncracy
s1 == Status.completed # => true
s2 == Status.completed # => false
s3 == Status.completed # => false
s1.completed? # => true
s2.completed? # => true
s3.completed? # => true
s1.active? # => false
s1.to_s # => C
s2.to_s # => P
s3.to_s # => F
Is there any other potential problem with this or does this look acceptable?
So, if you want to know if its in the completed but not validated state you would need to do one of these:
if s.completed? && ! s.validated?
# needs validation
end
if s.completed? && ! (s.passed? || s.failed?)
# needs validation
end
Are there other ways? Am I missing something?
I think there should be a way to opt-out of the validation step. completed
is still a sensible default. Passed and Failed can be fine-grained options, but since validation is presumably going to be based on creating regexes that scan output files, we don't want to force that on a developer.
I am not sure where you are putting those methods Eric. I thought you said the status value object can only have a single status?
Jeremy, those methods are on every instance. See the current functionality:
irb(main):001:0> f = OSC::Machete::Status.failed
=> Failed
irb(main):002:0> f.completed?
=> false
irb(main):003:0> f.queued?
=> false
irb(main):004:0> f.failed?
=> true
irb(main):005:0> f.active?
=> false
irb(main):006:0>
But f
is "failed". An instance can only be one status value. Thats what I meant.
Yes, I understand the current functionality. But what you showed above contradicts current functionality and I was curious as to what route you were wanting to take.
@nickjer @brianmcmichael since we would be adding a passed
value, what do we do with existing simulations that are going to have their validated simulations as being labeled as completed
and not passed
? For example, every simulation that has been run in Container Fill Sim is labeled as being completed
if the results are valid.
Or another way to put this, what value are we adding to our developers by adding passed
and is it worth the tradeoff?
Offline decision was made to:
completed
to passed
completed?
which returns true for both passed
and failed
C
will be the character representation of passed
for serialization backwards compatibilityI am fine with this.
I will formally state that I don't like passed as a default because of the presumption of validity.
If the developer didn't make validations, then it is assumed the developer trusts all results from his batch job.
Naturally, I trust all results from my batch jobs, but this places a heavy burden on the developer.
For example, with LS-Dyna, the RTFM way to know if a job has failed is to check the output for the string "E r r o r". A job can fail for reasons unrelated to the solver, however, and if a job returns with a system error, for example, suppose a module got switched, that prevents the solver from running, the jobs will always appear to have "Passed".
I don't like that.
The string for StatusValue.passed
can probably be "Completed" or "Successfully Completed". And even for the case you mentioned above it should still be up to the developer. He should not only test for failure, but also test for results.
Otherwise it will confuse the user if he sees some jobs "Successfully Completed", other jobs "Completed", and even still other jobs as "Failed". And seeing the "Completed" jobs missing results.
Then should we add a StatusValue.error
for the case where jobs do not pass error validation but also do not pass results validation?
This is considering that checking for errors is the documented process, and results validation is based on developer assumption.
StatusValue.failed should be used by the developer in the case of any type of job failure.
A job can fail for reasons unrelated to the solver, however, and if a job returns with a system error, for example, suppose a module got switched, that prevents the solver from running, the jobs will always appear to have "Passed".
If the job returns a with a system error, then wouldn't the results not be present? The responsibility of inspecting the simulation directory for expected output files and results files should be the developer's.
But again, this presupposes that an app developer has the ability validate. ESP's working with custom solvers may not have the tidy and predicatable outputs of more mature software like Abaqus. And LSTC specificly states that errors can be detected by a specific string, but they do not explicitly provide the check for a positive run.
Requiring validation also puts an initial barrier to entry on an app because then the app developer, who may not have experience with the HPC outputs, will need to come up with a way to detect whether a job actually did pass or not. If a job with dependencies is interrupted or killed by an admin, we're going to be providing a status of "Passed" to the developer.
passed
is added https://github.com/AweSim-OSC/osc-machete/pull/84
Right now it does not, so when I call Status.completed? on a job that has been marked as failed, false is returned.
I propose that the definition of completed be extended to include failed, or a new option be added that includes the subset of (failed && completed) jobs.