actionml / harness

Harness is a Machine Learning/AI Server with plugins for many algorithms including the Universal Recommender
Apache License 2.0
283 stars 49 forks source link

Administrator operations error reporting #296

Open pferrel opened 3 years ago

pferrel commented 3 years ago

can we store error status for engine add, update, delete so that future status will report errors that are only in the logs. We are only reporting these possible errors in logs with the etcd async engine ops.

pferrel commented 3 years ago

The engine ops are now async with the REST calls. This is somewhat like how we handle hctl import ... and hctl train ... as "jobs". However engine ops are tied to the Administrator, not a specific engine so they may have a different lifecycle and status info. If we stored some info with other jobs, then errors that had only gone to the server logs can be stored in a form where they can be retrieved in an hctl status engines ... call.

pferrel commented 3 years ago

only applies to Harness 1.0+