TrueBlocks / trueblocks-core

The main repository for the TrueBlocks system
https://trueblocks.io
GNU General Public License v3.0
1.04k stars 194 forks source link

Issues running chifra init --all #3606

Closed dreadedhamish closed 1 month ago

dreadedhamish commented 2 months ago

This documents what I thought was rate-limiting but might be another issue.

Running "chifra init -all" things went smoothly until about 60% downloaded. Then errors like this started while some downloads continued, and then later it was all these errors:

WARN[21-04|13:49:19.052] Failed download 005827115-005832546 (will retry)
EROR[21-04|13:49:27.046] write to disc error [error copying 006635375-006643357 file in writeBytesToDisc: [stream error: stream ID 17; PROTOCOL_ERROR; received from peer]]

This continued for about 12 hours (very rough number) before downloads started running again. Another github issue indicated this might be rate-limiting, but here are the logs from when downloads started running again:

EROR[21-04|14:52:31.706] write to disc error [error copying 000864346-001076302 file in writeBytesToDisc: [stream error: stream ID 1; PROTOCOL_ERROR; received from peer]] EROR[21-04|14:52:35.707] write to disc error [error copying 000590511-000864345 file in writeBytesToDisc: [stream error: stream ID 3; PROTOCOL_ERROR; received from peer]] EROR[21-04|14:52:39.707] write to disc error [error copying 000000001-000590510 file in writeBytesToDisc: [stream error: stream ID 5; PROTOCOL_ERROR; received from peer]] INFO[21-04|14:52:43.709] Finished download of index 000000000-000000000 ( 5547 of 8842 62.7% [sleep: 4000.00ms])
INFO[21-04|14:52:47.709] Completed initializing index files.
WARN[21-04|14:52:47.709] Retrying 3295 downloads INFO[21-04|14:52:47.709] Retrying 3295 bloom(s) INFO[21-04|14:53:00.943] Finished download of index 017487480-017490587 ( 5548 of 8842 62.7%)
INFO[21-04|14:53:03.036] Finished download of index 016697847-016700000 ( 5549 of 8842 62.8%)
INFO[21-04|14:53:05.025] Finished download of index 015991281-015994530 ( 5550 of 8842 62.8%)
INFO[21-04|14:53:05.084] Finished download of index 015984929-015988291 ( 5551 of 8842 62.8%)

100% errors, until the initial run completed and failed downloads were retried - then 100% success. I think it's unikely that any rate-limited was lifted at preceicesly the moment the initial run completed and retries commenced.

danrmiller commented 2 months ago

I had the same.

tjayrush commented 1 month ago

Does anyone have any suggestions for this? @dszlachta ?

In some places in our code, we back off progressively - so if it fails once, we sleep for ,X seconds. If we fail twice, .2X seconds, three times 4 and so on. I think this code simple keeps retrying.

As far as why it started working once it finished and restarted the retries, perhaps it was using the same connection or something and started a new connection when it restarted? Not sure.

tjayrush commented 1 month ago

Simplest progressive backoff from chat:

package main

import (
    "fmt"
    "net/http"
    "time"
)

func callAPI(url string) (*http.Response, error) {
    client := &http.Client{}
    req, err := http.NewRequest("GET", url, nil)
    if err != nil {
        return nil, err
    }
    resp, err := client.Do(req)
    if err != nil {
        return nil, err
    }
    return resp, nil
}

func main() {
    url := "https://api.example.com/data"
    maxRetries := 5
    backoff := time.Second

    for attempt := 1; attempt <= maxRetries; attempt++ {
        resp, err := callAPI(url)
        if err != nil {
            fmt.Println("Error calling API:", err)
            return
        }
        defer resp.Body.Close()

        if resp.StatusCode == http.StatusOK {
            fmt.Println("API call successful")
            return
        } else if resp.StatusCode == http.StatusTooManyRequests {
            fmt.Printf("Rate limited. Attempt %d/%d. Retrying in %s...\n", attempt, maxRetries, backoff)
            time.Sleep(backoff)
            backoff *= 2
        } else {
            fmt.Println("API call failed with status:", resp.StatusCode)
            return
        }
    }

    fmt.Println("Max retries reached. Exiting.")
}
tjayrush commented 1 month ago

This will be re-opened. Closing for now.