MAAP-Project / maap-documentation

9 stars 12 forks source link

Document maaping of maap vs hysds job states #384

Closed rtapella closed 3 months ago

rtapella commented 4 months ago

Somewhere in job-monitoring docs, put the hysds <-> maap status mapping. https://esa-nasa-maap.slack.com/archives/CLQB3B620/p1704304165363909?thread_ts=1704238761.453229&cid=CLQB3B620

MAAP - HySDS Accepted - job-queued Running - job-started Success - job-completed Failed - job-offline or job-failed

HySDS state not valid/used in MAAP: job-deduped

chuckwondo commented 4 months ago

@rtapella, instead of updating the docs, can we just get the Jobs UI updated to show the statuses as Accepted, Running, Success, and Failed?

rtapella commented 3 months ago

ah so... I think I misunderstood.

HySDS has job-started (etc.) Jobs UI has job-started maap.py / WPS has Running

Is that right? The jobs UI is actually passing through HySDS status instead of using the WPS term (a standard term)?

chuckwondo commented 3 months ago

That appears to be the case.

marjo-luc commented 3 months ago

There are also the job-offline and job-deduped statuses. How should those be mapped?

chuckwondo commented 3 months ago

@marjo-luc would you clarify precisely what they indicate?

marjo-luc commented 3 months ago

@chuckwondo job-offline: worker node is offline job-deduped: job that was submitted is identical to existing job

https://hysds-core.atlassian.net/wiki/spaces/HYS/pages/527761452/Job+Workflow+in+HySDS

chuckwondo commented 3 months ago

Thanks @marjo-luc. That helps, although perhaps further explanation would help regarding what "offline" means. Will offline jobs ever become "online" again? Perhaps these 2 statuses don't really matter so much and can be considered "Deleted"?

rtapella commented 3 months ago

It's possible that HySDS has some states that aren't "OGC compliant" and maybe those would just be passed through directly from the ADES?

rtapella commented 3 months ago

'deduped' doesn't sound like a status as much as a flag on a 'complete' job (one with another real status). but probably just got jammed in for expediency. if we have some sort of marker/flag for jobs we could use that for 'deduped' and use one of the other statuses for the status/progress.

chuckwondo commented 3 months ago

I'm a bit concerned about "deduping" because we have a case where we want to run jobs with identical inputs many (~30-50) times in order to collect average performance statistics.

The description of deduped is this:

Jobs that have identical parameters, or if the same job was already successfully completed, are deduped and no further processing occurs.

Does that mean we won't be able to submit batches of identical jobs and have them all executed? Will all but one of them be deduped?

cc: @wildintellect (re: get-dem)

marjo-luc commented 3 months ago

@chuckwondo I think the concern you bring up is why we turned the dedupe feature off on MAAP.

@rtapella Yes -- I'm sure HySDS has job statuses that are not strictly OGC-compliant. We don't necessarily need to force MAAP to adopt them, I just want to make sure they are mapped to something that makes sense to users -- for example, I think we can safely map job-offline to job-failed.

rtapella commented 3 months ago

Do you mean map job-offline to Failed ?

rtapella commented 3 months ago

to do: also document this here: https://docs.maap-project.org/en/latest/system_reference_guide/jobs_maappy.html#Monitor-a-Job

marjo-luc commented 3 months ago

I meant we could interpret the HySDS status of job-offline as being equivalent to the HySDS status of job-failed.

rtapella commented 3 months ago

gotcha

rtapella commented 3 months ago

https://github.com/MAAP-Project/maap-documentation/pull/386

rtapella commented 3 months ago

Well I put the hysds <-> maap mapping in a PR (see above). We should also update the Jobs UI to use the MAAP terminology as per the discussion above.

rtapella commented 3 months ago

I merged this into the develop branch of maap-documentation (including job-revoked)