Separate function logs from the other logs

akihikokuroda commented 4 weeks ago

What is the expected enhancement?

Functions may be provided by different parties from the party who executes the function. The log from the function may be better to be separated from the part of log.

akihikokuroda commented 4 weeks ago

Here is an idea to implement this.

The log entry from the function must be tagged (marked) like [Function: <function name>] contents [End Function] by putting the tag manually or by the function provided in the Serverless SDK.

Then

The scheduler filters the log entries and put into the new function_log item of the Job and put the new gateway API to return from the item

OR

The gateway filters the log and provide APIs that return the function log or the other part of log

WDYT @Tansito @psschwei @IceKhan13

akihikokuroda commented 4 weeks ago

Do we need to emit the function log to somewhere?

psschwei commented 4 weeks ago

One could make the argument that end users just need the function output/result and logs should only be available to the provider. That would probably require providers to generate something to return when jobs fail (so would shift some ownership to them), but it would also remove the need for us to filter logs (reducing our responsibility a bit, which I also like).

Tansito commented 4 weeks ago

Yeah, we were commenting good approximations to this and due to the deadlines that we have our proposal would be:

Logs from Provider Functions are going to be accesible only by user providers

So for this, the proposal in what I was thinking it's something similar to what Paul is saying:

A user calls to /job/:id/logs end-point
- if the user is a provider we return the logs (we know that a user is a provider because a provider has an admin_group assigned)
- if not we return a message
A provider can obtain jobs id's from its functions through: /function/:id/jobs
- This way providers can discover jobs that are being executed by their functions.
We need to close a structure for the result field to something like:
- Format: { status: "ERROR_CODE_PRE_DEFINED", result: "" }
- This way, as Paul said, providers can manage their errors and show to the user what they want
In the Runner, provide a logger to be used by the providers (maybe the one from ray).

WDYT?

akihikokuroda commented 4 weeks ago

A user calls to /job/:id/logs end-point if the user is a provider we return the logs (we know that a user is a provider because a provider has an admin_group assigned) if not we return a message

Our log is a string returned by ray.job_submission.JobSubmissionClient.get_job_logs. It is saved into Job.logs and retrieved via api/v1/job/<job id>/logs. So this api is changed to return the log only for the provider.

A provider can obtain jobs id's from its functions through: /function/:id/jobs

Here is the relationship among jobs and function. middleware job id <1 - 1> function <1 - m> runtime job id These may be useful apis for the provider

retrieve middleware job ids of Function executions to get the logs of execution
retrieve runtime job ids of Middleware job

We need to close a structure for the result field to something like: Format: { status: "ERROR_CODE_PRE_DEFINED", result: "" }

The result is the return value from the Runner.run function. It must be a json string now. We probably put some recommendations in the SDK doc.

In the Runner, provide a logger to be used by the providers (maybe the one from ray).

We can pre-configure logger to output to a local file in main.tmpl and push it to a new job item (like job.function_logs) at the end of execution.

Tansito commented 4 weeks ago

So this api is changed to return the log only for the provider.

Just to clarify this point, Aki. We are going to continue supporting the current logic. For qiskit functions created by users the user will continue having access to its logs. The difference is that now, we are going to check if the job comes from a qiskit function shared by a provider and in that case only the provider will be able to read those logs.

These may be useful apis for the provider

retrieve middleware job ids of Function executions to get the logs of execution

retrieve runtime job ids of Middleware job

Exactly. What we are trying with this is to offer a way to be able to analyze some executions. And maybe add some filters in the end-point like the status in case the provider could want to analyze FAILED jobs.

Yes
I didn't think about this use-case but I think it can have sense, yep. We can comment it with @pandasa123

The result is the return value from the Runner.run function. It must be a json string now. We probably put some recommendations in the SDK doc.

Start with a recommendation could work, yep.

We can pre-configure logger to output to a local file in main.tmpl and push it to a new job item (like job.function_logs) at the end of execution.

I would like to start first with something simple. Just changing our current print in the runner for a logger is more than enough in this case.

akihikokuroda commented 4 weeks ago

OK. The required changes right now are:

Change the job.logs() api to check the user and if the job is executing the provider function then change the output accordingly
provide new api doing "retrieve middleware job ids of Function executions to get the logs of execution"

Tansito commented 4 weeks ago

Yep, basically that!

akihikokuroda commented 3 weeks ago

provide new api doing "retrieve middleware job ids of Function executions to get the logs of execution"

For this, a new api /api/v1/programs/<program id>/get_jobs is added to gateway.

Where is this api called from the client?

add get_jobs(function: QiskitFunction) -> List[job_id: str] to the ServerlessClient
add get_jobs() -> List[job_id: str] to QiskitFunction

any opinion? @psschwei @Tansito @IceKhan13

Tansito commented 3 weeks ago

I was thinking in a workflow like:

function = serverless.get("my-first-pattern")
function.get_jobs()

akihikokuroda commented 3 weeks ago

OK. It seems good. Thanks!

Qiskit / qiskit-serverless

Separate function logs from the other logs #1400

What is the expected enhancement?