codex-storage / nim-codex

Decentralized Durability Engine
Apache License 2.0
63 stars 24 forks source link

[BUG] Codex crashes on interrupted upload #685

Open gmega opened 8 months ago

gmega commented 8 months ago

Describe the bug Codex crashes if I start uploading a large file to it and then CTRL+C my curl request before it completes.

To Reproduce

  1. Launch codex.

  2. Do:

    dd if=/dev/urandom of=./data.bin bs=200M count=1
    curl -iv -XPOST "http://localhost:8080/api/codex/v1/data" --data @data.bin

    as the file is being uploaded, interrupt curl with CTRL+C. Codex will crash with:

    TRC 2024-01-25 16:01:20.500-03:00 CatchableError exception                   topics="codex node" tid=585029 exc="Stream boundary is not reached yet"
    /home/giuliano/Work/Status/nim-codex/codex.nim(131) codex
    /home/giuliano/Work/Status/nim-codex/vendor/nim-chronos/chronos/asyncloop.nim(263) poll
    /home/giuliano/Work/Status/nim-codex/vendor/nim-chronos/chronos/asyncfutures2.nim(318) futureContinue
    /home/giuliano/Work/Status/nim-codex/codex/chunker.nim(95) reader
    [[reraised from:
    /home/giuliano/Work/Status/nim-codex/codex.nim(131) codex
    /home/giuliano/Work/Status/nim-codex/vendor/nim-chronos/chronos/asyncloop.nim(263) poll
    /home/giuliano/Work/Status/nim-codex/vendor/nim-chronos/chronos/asyncfutures2.nim(318) futureContinue
    /home/giuliano/Work/Status/nim-codex/vendor/nim-chronos/chronos/asyncmacro2.nim(213) reader
    ]]
    [[reraised from:
    /home/giuliano/Work/Status/nim-codex/codex.nim(131) codex
    /home/giuliano/Work/Status/nim-codex/vendor/nim-chronos/chronos/asyncloop.nim(263) poll
    /home/giuliano/Work/Status/nim-codex/vendor/nim-chronos/chronos/asyncfutures2.nim(355) futureContinue
    ]]
    Error: unhandled exception: Stream boundary is not reached yet [Defect]

Expected behavior Codex should simply handle the aborted upload, either deleting all blocks that belong to the failed upload or accepting the ones that got uploaded.

Environment:

Additional context Add any other context about the problem here.

dryajov commented 8 months ago

Error: unhandled exception: Stream boundary is not reached yet [Defect]

Why is this a defect? It's not a Codex crash Chronos is for some reason aborting the process on a malformed stream, which isn't sound.

veaceslavdoina commented 8 months ago

Looks similar to https://github.com/codex-storage/nim-codex/issues/527, but it is not reproduced anymore on codexstorage/nim-codex:latest-dist-tests - fd3c56.

But some weird and reproducible behaviour observed

Docker - not crashed

# Run
docker run \
  -e CODEX_API_BINDADDR=0.0.0.0 \
  -e CODEX_LOG_LEVEL=TRACE \
  -p 8080:8080 \
  codexstorage/nim-codex:latest-dist-tests

# Generate
dd if=/dev/urandom of=upload.bin bs=1000M count=1

# Test
timeout 5 curl -X POST localhost:8080/api/codex/v1/data \
  -H "Content-Type: application/octet-stream" \
  -H "Expect: 100-continue" \
  -T upload.bin

Binary - crashed

# Run
codex --log-level=TRACE

# Generate
dd if=/dev/urandom of=upload.bin bs=1000M count=1

# Test
timeout 5 curl -X POST localhost:8080/api/codex/v1/data \
  -H "Content-Type: application/octet-stream" \
  -H "Expect: 100-continue" \
  -T upload.bin
2-towns commented 2 months ago

I am facing this problem as well. My use-case is that I'm trying to upload a file from the UI and when I cancel the request, the client crashes with this error:

ERR 2024-07-30 19:23:08.468+02:00 Unhandled exception in async proc, aborting topics="codex" tid=41748 msg="Stream boundary is not reached yet"

https://github.com/user-attachments/assets/b1ad6482-84fe-4dcf-9873-3141529b87af

gmega commented 2 months ago

Alright thanks @2-towns, gonna bump this up so we work on it.

2-towns commented 2 months ago

I did a git pull this morning and the node is not crashing anymore. Instead, I am seing the following log:

DBG 2024-07-31 12:06:35.308+02:00 Critical error occured while sending response topics="codex" tid=90717 meth=POST peer=127.0.0.1:57958 uri=/api/codex/v1/data code="400 Bad Request" error_msg="Unable to send response"
vpavlin commented 1 day ago

I'm seeing the following:

TRC 2024-10-02 10:14:52.685+02:00 Stream created                             topics="libp2p lpstream" tid=938714 s=66fd00fc07c555d6584bf4b5 objName=AsyncStreamWrapper dir=In
INF 2024-10-02 10:14:52.685+02:00 Storing data                               topics="codex node" tid=938714
TRC 2024-10-02 10:14:52.685+02:00 Reading bytes from reader                  topics="libp2p asyncstreamwrapper" tid=938714 bytes=65536
TRC 2024-10-02 10:14:52.906+02:00 CatchableError exception                   topics="codex node" tid=938714 exc="Incomplete chunk received"
ERR 2024-10-02 10:14:52.906+02:00 Unhandled exception in async proc, aborting topics="codex" tid=938714 msg="Incomplete chunk received"

I have a simple golang code which downloads a file from HTTP and passes the stream/buffer to a Codex REST API reques to upload data. When the program is killed I get the error above. It would be good if such en event did not crash the whole node:)

Version: master (264bfa17f54b7973432d000446a725a4b5a6134a) System: ubuntu 23.10

dryajov commented 1 day ago

I'm seeing the following:

TRC 2024-10-02 10:14:52.685+02:00 Stream created                             topics="libp2p lpstream" tid=938714 s=66fd00fc07c555d6584bf4b5 objName=AsyncStreamWrapper dir=In
INF 2024-10-02 10:14:52.685+02:00 Storing data                               topics="codex node" tid=938714
TRC 2024-10-02 10:14:52.685+02:00 Reading bytes from reader                  topics="libp2p asyncstreamwrapper" tid=938714 bytes=65536
TRC 2024-10-02 10:14:52.906+02:00 CatchableError exception                   topics="codex node" tid=938714 exc="Incomplete chunk received"
ERR 2024-10-02 10:14:52.906+02:00 Unhandled exception in async proc, aborting topics="codex" tid=938714 msg="Incomplete chunk received"

I have a simple golang code which downloads a file from HTTP and passes the stream/buffer to a Codex REST API reques to upload data. When the program is killed I get the error above. It would be good if such en event did not crash the whole node:)

Version: master (264bfa17f54b7973432d000446a725a4b5a6134a) System: ubuntu 23.10

Does it crash as in the first case that @gmega documented, or does it just print the error and continues running? If it's the former, could you provide the stacktrace as well?