archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Problem: Approve transfer API endpoint fails #786

Open cole opened 5 years ago

cole commented 5 years ago

Expected behaviour

Requests to the dashboard approve transfer API endpoint succeed.

Current behaviour

When MCP server is too busy to finish the approve transfer job within the timeout window (30 seconds in 1.9.2+, 5 seconds before that), it returns an error: "Unable to start the transfer. Deadline exceeded"

It seems that we need a long timeout here because we are waiting for the job to complete, when the API endpoint should be able to return after the approval has been submitted.

Steps to reproduce

This is an intermittent problem that requires a busy MCPServer to reproduce. Roughly:

  1. Run 10+ transfers at once
  2. Try to approve another transfer via API request

It may take some time to reproduce.

Your environment (version of Archivematica, OS version, etc)

1.9.1 / CentOS 7


For Artefactual use: Please make sure these steps are taken before moving this issue from Review to Verified in Waffle:

cole commented 5 years ago

A short term fix for this issue was has been put in place for #624.

joel-simpson commented 5 years ago

Increasing the timeouts is obviously one approach to mitigating this issue, but that approach comes with disadvantages. I don't know what the best timeout duration is, but there will always need to be a timeout defined so you can never really get away from the need to return this sort of error.

Do you think @cole the error message being returned by the API in this case is sufficient for a client to be able to know how to handle it?

For example, if we are using the client in automation tools to approve a transfer, and this error is returned, I think we'd want automation tools to recognize MCP Server is "busy" and the correct thing to do is retry the same request later.

Perhaps if the error message was more like: "Unable to approve transfer. Service busy. Please retry later" the user of the API endpoint would be able to design their client to handle the situation better.

(in which case maybe we need another issue in Automation tools to highlight the lack of that 'retry' capability)

cole commented 5 years ago

Do you think @cole the error message being returned by the API in this case is sufficient for a client to be able to know how to handle it?

Not really, no. We could return a more useful error message (as you suggested), and maybe set a Retry-After HTTP header (although we'd also need to make the client respect that).