Closed azat-alimov-db closed 11 months ago
It seems like this is happening when the crawler uses the proxy to connect to the api. We probably need a different variable name for that case.
That what we set as a proxy setting:
Could you try removing the HTTP_PROXY variable, it should be the one used for the connections from the crawler to the api.
Ok, tried, looks like the connection can be established now, but getting SSL errors: 2023-12-12 21:55:59 WARNING Monocle.Effects:526: network error {"index":"test","crawler":"coder","stream":"Projects","count":7,"limit":7,"loc":"api.github.com:443/graphql","failed":"InternalException ProtocolError \"error:0A000086:SSL routines::certificate verify failed\""}
Any hints on configuring SSL certs for crawler (since there is a replacement of SSL cert with our org signed certificate, when going through the proxy) or maybe any way to run crawler in insecure mode?
Alright, thanks.
SSL is implemented by openssl, so setting SSL_CERT_FILE
should work.
gotcha, thank you. I'll work on that tomorrow, since will need to update a deployment yaml and mount ssl certs somewhere as a secret
The related change is merged. New container image should be published soon. https://github.com/change-metrics/monocle/actions/runs/7199715334
Hello,
I added a certificate to a deployment and set the env var to:
- name: SSL_CERT_FILE
value: /etc/pki/tls/certs/db-server-ca-6.cer
Then tested with curl and connection works fine via proxy:
bash-4.2$ curl -v -o /dev/null https://api.github.com
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* About to connect() to proxy *** port 8080 (#0)
* Trying 10.245.32.5...
* Connected to *** (10.245.32.5) port 8080 (#0)
* Establish HTTP proxy tunnel to api.github.com:443
> CONNECT api.github.com:443 HTTP/1.1
> Host: api.github.com:443
> User-Agent: curl/7.29.0
> Proxy-Connection: Keep-Alive
>
< HTTP/1.0 200 Connection established
<
* Proxy replied OK to CONNECT request
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* CAfile: /etc/pki/tls/certs/db-server-ca-6.cer
CApath: none
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
***
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: api.github.com
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Wed, 13 Dec 2023 19:44:29 GMT
< ETag: W/"4f825cc84e1c733059d46e76e6df9db557ae5254f9625dfe8e1b09499c449438"
< Vary: Accept, Accept-Encoding, Accept, X-Requested-With
< Server: GitHub.com
< Connection: Keep-Alive
< Content-Type: application/json; charset=utf-8
< Accept-Ranges: bytes
< Cache-Control: public, max-age=60, s-maxage=60
< Content-Length: 2262
< Referrer-Policy: origin-when-cross-origin, strict-origin-when-cross-origin
< X-Frame-Options: deny
< X-RateLimit-Used: 1
< X-XSS-Protection: 0
< X-RateLimit-Limit: 60
< X-RateLimit-Reset: 1702500276
< X-GitHub-Media-Type: github.v3; format=json
< X-GitHub-Request-Id: 5F6A:3D26CA:1FEADE:204AFE:657A09A4
< X-RateLimit-Resource: core
< X-RateLimit-Remaining: 59
< X-Content-Type-Options: nosniff
< Content-Security-Policy: default-src 'none'
< Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
< Access-Control-Allow-Origin: *
< Access-Control-Expose-Headers: ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Resource, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, X-GitHub-SSO, X-GitHub-Request-Id, Deprecation, Sunset
< x-github-api-version-selected: 2022-11-28
<
{ [data not shown]
100 2262 100 2262 0 0 5835 0 --:--:-- --:--:-- --:--:-- 5844
* Connection #0 to host *** left intact
But crawler still receives the following:
"2023-12-13 18:46:49 WARNING Macroscope.Main:317: Skipping due to an unexpected exception {"index":"test","crawler":"coder","err":"HttpExceptionRequest Request {\n host = \"api.github.com\"\n port = 443\n secure = True\n requestHeaders = [(\"Authorization\",\"<REDACTED>\"),(\"User-Agent\",\"change-metrics/monocle\"),(\"Content-Type\",\"application/json\")]\n path = \"/graphql\"\n queryString = \"\"\n method = \"POST\"\n proxy = Nothing\n rawBody = False\n redirectCount = 10\n responseTimeout = ResponseTimeoutDefault\n requestVersion = HTTP/1.1\n proxySecureMode = ProxySecureWithConnect\n}\n (InternalException ProtocolError \"error:0A000086:SSL routines::certificate verify failed\")"}"
Is it possible set it to insecure?
Appreciate any further suggestions
Perhaps you can try setting TLS_NO_VERIFY
to 1
ah, looks like it is TLS_NO_VERIFY variable, as per: https://github.com/change-metrics/monocle/blob/659e4c319b3b6c37777ae692952c7250448e7319/src/Monocle/Client.hs#L47C28-L47C41
Any idea why I get the a "Network error" from web UI (api), when trying to access it via browser
Logs of api service not throwing any suspicious errors and moreover it that I received 200:
[13/Dec/2023:20:17:46 +0000] "GET / HTTP/1.1" 200 - "" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36""
I've exposed the service via Cloud Load Balancer on GCP GKE, with LoadBalancer service type:
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/name: api
app.kubernetes.io/component: api
app.kubernetes.io/part-of: monocle
name: api-external
annotations:
networking.gke.io/internal-load-balancer-allow-global-access: "true"
networking.gke.io/load-balancer-type: "Internal"
spec:
type: LoadBalancer
ports:
- name: http-rest-api
port: 8080
targetPort: 8080
selector:
app.kubernetes.io/name: api
status:
loadBalancer: {}
Have you try setting COMPOSE_MONOCLE_PUBLIC_URL
?
yep, set that for api and crawler, but still getting the same "Network error" message
Oops I meant MONOCLE_PUBLIC_URL
, this should be the url you are using to access the web UI, it is only needed for the api container. It defaults to localhost, so if you look in your browser network inspect tab, you should see that the network error message happens because the client tries to connect to localhost.
Sorry can't give you a screenshots, but while looking into Chrome developer tools, I see the following for "about" request: General:
Request URL:
http://localhost:8080/api/2/about
Referrer Policy:
strict-origin-when-cross-origin
Request Headers:
Accept:
*/*
Access-Control-Request-Headers:
content-type
Access-Control-Request-Method:
POST
Origin:
http://100.88.10.138:8080
Sec-Fetch-Mode:
cors
User-Agent:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36
Awesome, that did the trick. Thank you very much @TristanCacqueray ! Let me play around with that great tool.
Feel free to close this issue record
You're welcome, have fun!
Hello,
Thank you for helping out with a question about proxy settings for crawler. Now when I'm trying to run a test indexing, I'm getting the following error: 2023-12-12 21:01:48 WARNING Macroscope.Main:317: Skipping due to an unexpected exception {"index":"test","crawler":"coder","err":"Decoding of CommitInfoRequest {commitInfoRequestIndex = \"test\", commitInfoRequestCrawler = \"coder\", commitInfoRequestEntity = Enumerated {enumerated = Right EntityTypeENTITY_TYPE_ORGANIZATION}, commitInfoRequestOffset = 0} failed with: \"Error in $: Failed reading: not a valid json value at '<!DOCTYPEhtmlPUBLIC-W3CDTDXHTML1.0TransitionalENhttp:www.w3.orgTRxhtml1DTDxhtm'\"\nCallStack (from HasCallStack):\n error, called at src/Relude/Debug.hs:289:11 in relude-1.2.0.0-Jiwa4gfuZvkK1snRof3V:Relude.Debug\n error, called at src/Monocle/Client.hs:107:17 in monocle-0.1.10.0-1juCsBb4vJ35WvYo0D138g:Monocle.Client"}
Here is a config: workspaces:
Any idea what that would mean? Appreciate any hints