benthosdev / benthos

Fancy stream processing made operationally mundane
https://www.benthos.dev
MIT License
7.68k stars 752 forks source link

Download of zip file fails #2513

Closed GeorgeGkinis closed 1 month ago

GeorgeGkinis commented 1 month ago

I am trying to download historical data from binance:

input:
  generate:
    mapping: root = {"url":"https://data.binance.vision/data/spot/daily/trades/1INCHUSDT/1INCHUSDT-trades-2023-03-20.zip"}
    interval: 0
    count: 1
pipeline:
  processors:
    - http:
        url: ${! json("url")}
        dump_request_log_level: TRACE
output:
  file:
    path: downloaded.zip
    codec: all-bytes

The output is:

benthos -c download_simple.yml 
INFO Running main config from specified file       @service=benthos benthos_version=4.26.0 path=download_simple.yml
INFO Listening for HTTP requests at: http://0.0.0.0:4195  @service=benthos
INFO Launching a benthos instance, use CTRL+C to close  @service=benthos
ERRO HTTP request to '${! json("url")}' failed: https://data.binance.vision/data/spot/daily/trades/1INCHUSDT/1INCHUSDT-trades-2023-03-20.zip: HTTP request returned unexpected response code (403): 403 Forbidden, Error: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"><TITLE>ERROR: The request could not be satisfied</TITLE></HEAD><BODY><H1>403 ERROR</H1><H2>The request could not be satisfied.</H2><HR noshade size="1px">This distribution is not configured to allow the HTTP request method that was used for this request. The distribution supports only cachable requests.We can't connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.<BR clear="all">If you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.<BR clear="all"><HR noshade size="1px"><PRE>Generated by cloudfront (CloudFront)Request ID: hY0UN-UvRBJmmJOwb1olBpWcMS0eTTDyLbf6qIJXY4s1N3Yd8nUsoQ==</PRE><ADDRESS></ADDRESS></BODY></HTML>  @service=benthos label="" path=root.pipeline.processors.0
INFO Pipeline has terminated. Shutting down the service  @service=benthos

The strange thing the following works from the same terminal session, same PC: CURL:

curl --output download.zip https://data.binance.vision/data/spot/daily/trades/1INCHUSDT/1INCHUSDT-trades-2023-03-20.zip && ll download.zip
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  255k  100  255k    0     0   118k      0  0:00:02  0:00:02 --:--:--  118k
-rw-rw-r-- 1 q q 261558 avril 14 13:31 download.zip

GO:

package main

import (
    "io"
    "net/http"
    "os"
)

func main() {
    url := "https://data.binance.vision/data/spot/daily/trades/1INCHUSDT/1INCHUSDT-trades-2023-03-20.zip"
    err := DownloadFile("1INCHUSDT-trades-2023-03-20.zip", url)
    if err != nil {
        panic(err)
    }
    println("Download completed!")
}

// DownloadFile will download a file from a URL and store it locally
func DownloadFile(filepath string, url string) error {
    response, err := http.Get(url)
    if err != nil {
        return err
    }
    defer response.Body.Close()

    out, err := os.Create(filepath)
    if err != nil {
        return err
    }
    defer out.Close()

    _, err = io.Copy(out, response.Body)
    return err
}
go run download.go 
Download completed!

Any ideas why this only happens in benthos?

mihaitodor commented 1 month ago

Hey @GeorgeGkinis, there are 2 issues in there:

This should work:

input:
  generate:
    mapping: root = {"url":"https://data.binance.vision/data/spot/daily/trades/1INCHUSDT/1INCHUSDT-trades-2023-03-20.zip"}
    interval: 0
    count: 1
pipeline:
  processors:
    - mapping: |
        root = ""
        meta url = this.url
    - http:
        verb: GET
        url: ${! meta("url")}

output:
  file:
    path: downloaded.zip
    codec: all-bytes