OCR-D / ocrd-webapi-implementation

4 stars 0 forks source link

Workspace endpoint broken in Operandi live VM instance #21

Closed MehmedGIT closed 1 year ago

MehmedGIT commented 1 year ago

There is some silent error on the post workspace method that kills the entire server inside the live VM instance.

22:45:46.189 INFO ocrd_webapi.managers.workspace_manager - Using the existing workspacess base directory: /tmp/ocrd-webapi-data/workspaces
22:45:46.190 INFO ocrd_webapi.managers.workflow_manager - Using the existing workflowss base directory: /tmp/ocrd-webapi-data/workflows
22:45:47.359 INFO ocrd_webapi.managers.workflow_manager - Detected Nextflow version: 22.10.6.5843
22:45:47.376 INFO operandi_server.server - Connecting RMQPublisher to RabbitMQ server: localhost:5672/
INFO:     Started server process [12608]
INFO:     Waiting for application startup.
22:45:47.396 INFO operandi_server.server - Operandi server url: http://0.0.0.0:8000
22:45:47.396 INFO ocrd_webapi.database - MongoDB Name: ocrd-webapi-db
22:45:47.396 INFO ocrd_webapi.database - MongoDB URL: mongodb://localhost:27018
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     91.9.122.213:44178 - "POST /workflow HTTP/1.1" 200 OK
INFO:     91.9.122.213:44194 - "POST /workflow HTTP/1.1" 200 OK
22:45:51.673 INFO root - Skipping <bound method Profile.validate_payload_manifests_allowed of <bagit_profile.Profile object at 0x7f7ba5bb13c0>> introduced in version (1, 3, 0) (version validated: (1, 2, 0))
22:45:51.674 INFO root - Skipping <bound method Profile.validate_tag_manifests_allowed of <bagit_profile.Profile object at 0x7f7ba5bb13c0>> introduced in version (1, 3, 0) (version validated: (1, 2, 0))
22:45:51.684 INFO bagit - Verifying checksum for file /tmp/ocrd-bagit-zgdwhs2d/data/mets.xml
22:45:51.684 INFO bagit - Verifying checksum for file /tmp/ocrd-bagit-zgdwhs2d/data/OCR-D-IMG/madeUpId-2.jpg
22:45:51.686 INFO bagit - Verifying checksum for file /tmp/ocrd-bagit-zgdwhs2d/manifest-sha512.txt
22:45:51.686 INFO bagit - Verifying checksum for file /tmp/ocrd-bagit-zgdwhs2d/bag-info.txt
22:45:51.687 INFO bagit - Verifying checksum for file /tmp/ocrd-bagit-zgdwhs2d/bagit.txt
22:45:51.696 INFO ocrd.workspace_bagger - Spilling /tmp/ocrd-webapi-data/workspaces/9f0606a4-f181-4afc-92dc-ddca1247e5a6.zip to /tmp/ocrd-webapi-data/workspaces/9f0606a4-f181-4afc-92dc-ddca1247e5a6
INFO:     91.9.122.213:44202 - "POST /workspace HTTP/1.1" 200 OK
INFO:     Shutting down
INFO:     Waiting for application shutdown.
22:45:51.843 INFO operandi_server.server - The Operandi Server is shutting down.
INFO:     Application shutdown complete.
INFO:     Finished server process [12608]

This is surprising because, on my local machine, it seems to work just fine. I think it's time to put more effort into the robustness of the WebAPI and start better logging to detect errors.

joschrew commented 1 year ago

Hi, I can confirm that it cannot be produced locally. What I tried was just starting up the webapi locally from command line and mongodb and rabbitmq in docker. Then I used the post-workspace-endpoint to push the example-workflow: curl -X POST http://localhost:8000/workspace -H 'content-type: multipart/form-data' -F workspace=@example_ws.ocrd.zip. That worked without a problem.

MehmedGIT commented 1 year ago

It was resolved in version v0.8.3. The problem was that the OcrdZipValidator's validate() method uses 2 processes for validation by default. This seems problematic from the uvicorn's perspective.