goccy / bigquery-emulator

BigQuery emulator server implemented in Go
MIT License
810 stars 106 forks source link

HTTP Location header seems to be using 0.0.0.0 as hostname for redirects #160

Open sbv-csis opened 1 year ago

sbv-csis commented 1 year ago

When trying to use load_from_table on a bigquery-emulator process running in docker it seems like a redirect is issued that is not available "remote" - aka. other docker containers via docker network - here i've linked a bigquery-emulator(named localquery, accessed via a docker network on 172.18.0.3) container with another container in docker-compose and the inital POST goes through but the following the redirect fails:

DEBUG urllib3.connectionpool:connectionpool.py:456 http://localquery:9050 "POST /upload/bigquery/v2/projects/test-project-bla/jobs?uploadType=resumable HTTP/1.1" 200 883 DEBUG urllib3.connectionpool:connectionpool.py:228 Starting new HTTP connection (1): 0.0.0.0:9050 DEBUG urllib3.connectionpool:connectionpool.py:228 Starting new HTTP connection (2): 0.0.0.0:9050 DEBUG urllib3.connectionpool:connectionpool.py:228 Starting new HTTP connection (3): 0.0.0.0:9050 DEBUG urllib3.connectionpool:connectionpool.py:228 Starting new HTTP connection (4): 0.0.0.0:9050 DEBUG urllib3.connectionpool:connectionpool.py:228 Starting new HTTP connection (5): 0.0.0.0:9050 DEBUG urllib3.connectionpool:connectionpool.py:228 Starting new HTTP connection (6): 0.0.0.0:9050 DEBUG urllib3.connectionpool:connectionpool.py:228 Starting new HTTP connection (7): 0.0.0.0:9050

I dont know if just a path-only redirect would work generally - or at all:

diff --git a/server/handler.go b/server/handler.go
index a18af66..3452acd 100644
--- a/server/handler.go
+++ b/server/handler.go
@@ -268,8 +268,7 @@ func (h *uploadHandler) serveResumable(w http.ResponseWriter, r *http.Request) {
        w.Header().Add(
                "Location",
                fmt.Sprintf(
-                       "%s/upload/bigquery/v2/projects/%s/jobs?uploadType=resumable&upload_id=%s",
-                       addr,
+                       "/upload/bigquery/v2/projects/%s/jobs?uploadType=resumable&upload_id=%s",
                        project.ID,
                        job.JobReference.JobId,
                ),
sbv-csis commented 1 year ago

Ok, that wasnt enough


diff --git a/server/handler.go b/server/handler.go
index a18af66..a4de0ea 100644
--- a/server/handler.go
+++ b/server/handler.go
@@ -260,7 +260,7 @@ func (h *uploadHandler) serveResumable(w http.ResponseWriter, r *http.Request) {
                errorResponse(ctx, w, errInternalError(err.Error()))
                return
        }
-       addr := server.httpServer.Addr
+       addr := server.hostname
        if !strings.HasPrefix(addr, "http") {
                addr = "http://" + addr
        }

diff --git a/server/server.go b/server/server.go
index 22affa8..48e2b87 100644
--- a/server/server.go
+++ b/server/server.go
@@ -23,6 +23,7 @@ import (
 type Server struct {
        Handler      http.Handler
        storage      Storage
+       hostname     string
        db           *sql.DB
        loggerConfig *zap.Config
        logger       *zap.Logger
@@ -35,7 +36,7 @@ type Server struct {
 }

 func New(storage Storage) (*Server, error) {
-       server := &Server{storage: storage}
+       server := &Server{storage: storage, hostname: "0.0.0.0:9050"}
        if storage == TempStorage {
                f, err := os.CreateTemp("", "")
                if err != nil {
@@ -175,6 +176,10 @@ func (s *Server) SetLogLevel(level LogLevel) error {
        return nil
 }

+func (s *Server) SetHostname(hostname string) {
+       s.hostname = hostname
+}
+

this brings along the process a bit - running into unique constraint issues on tables but I figure that's another issue (internalError: failed to exec INSERT INTOtables(id,projectID,datasetID,metadata) VALUES (@id,@projectid,@datasetid,@metadata): UNIQUE constraint failed: tables.projectID, tables.datasetID, tables.id )

prismec commented 11 months ago

The returned Location header also contains the port, which leads to further problems, if e.g. the emulator is started in a container and the host port is mapped differently.

This behaviour leads, e.g. to the fact that the emulator cannot be used in as a Testcontainer (see https://java.testcontainers.org/modules/gcloud/#bigquery) as the ports on the host machine are allocated dynamically.

Example

docker ps
CONTAINER ID   IMAGE                                   COMMAND                  CREATED              STATUS              PORTS                                              NAMES
882d1db6950d   ghcr.io/goccy/bigquery-emulator:0.4.3   "/bin/bigquery-emula…"   About a minute ago   Up About a minute   0.0.0.0:58200->9050/tcp, 0.0.0.0:58201->9060/tcp   funny_ellis

A call to http://localhost:58200/upload/bigquery/v2/projects/test-project/jobs results in the response header Location: http://0.0.0.0:9050/upload/bigquery/v2/projects/test-project/jobs?uploadType=resumable&upload_id=167ec27f-dc87-4f69-bca3-bc81a4b18484, which then leads to the fact that the BigQuery client libraries want to connect to the port 9050 instead of 58200.

prismec commented 11 months ago

The returned Location header also contains the port, which leads to further problems, if e.g. the emulator is started in a container and the host port is mapped differently.

This behaviour leads, e.g. to the fact that the emulator cannot be used in as a Testcontainer (see https://java.testcontainers.org/modules/gcloud/#bigquery) as the ports on the host machine are allocated dynamically.

Example

docker ps
CONTAINER ID   IMAGE                                   COMMAND                  CREATED              STATUS              PORTS                                              NAMES
882d1db6950d   ghcr.io/goccy/bigquery-emulator:0.4.3   "/bin/bigquery-emula…"   About a minute ago   Up About a minute   0.0.0.0:58200->9050/tcp, 0.0.0.0:58201->9060/tcp   funny_ellis

A call to http://localhost:58200/upload/bigquery/v2/projects/test-project/jobs results in the response header Location: http://0.0.0.0:9050/upload/bigquery/v2/projects/test-project/jobs?uploadType=resumable&upload_id=167ec27f-dc87-4f69-bca3-bc81a4b18484, which then leads to the fact that the BigQuery client libraries want to connect to the port 9050 instead of 58200.

I am not sure if this problem is even solveable by the emulator at all. It would somehow need to get the information about port mappings from Docker and modify the returned location header.

richardware commented 10 months ago

@prismec This issue was also raised for the fake-gcs-server project: https://github.com/fsouza/fake-gcs-server/issues/1281

It seems the solution, with that project at least, is to pass -external-url as a parameter when instantiating the emulator e.g. -external-url https://127.0.0.1:9050. This is then used when constructing the location header:

https://github.com/fsouza/fake-gcs-server/blob/87dfef0e58609b2768fd6493091d78cd56b2e0d7/fakestorage/server.go#L440-L449

@goccy I wonder if similar configuration could be added for this project?

eddumelendez commented 10 months ago

Another alternative is to read an env var BQ_LOCATION_URL in line 265. So, if empty then return server.httpServer.Addr.

https://github.com/goccy/bigquery-emulator/blob/8ccde288d63846122e085433e0a03edfadba361c/server/handler.go#L265-L278