Closed mspublic closed 1 month ago
Do you get any error message from the output of g3fcgen?
No errors in g3fcgen stdout/err.
I did a local test with curl -x 127.0.0.1:8080 https://login.microsoftonline.com --insecure -v
and didn't found any problems.
It's random on when it happens. But it will typically happens somewhere in the flow logging into office365/word online
It looks like its failing out before g3fcgen is able to get the certificate. The error timestamp in g3proxy is earlier than the debug output of g3fcgen.
Aug 15 09:50:12.050 DEBG Worker#0 got certificate for host usc-word-edit.officeapps.live.com <g3fcgen::backend(mod.rs:166)>
Aug 15 09:50:35.087 DEBG Worker#0 got certificate for host usc-word-edit.officeapps.live.com <g3fcgen::backend(mod.rs:166)>
Aug 15 09:50:12.023 INFO auditor_name=default, daemon_name=, depth=1, intercept_type=TlsHandshake, log_type=Intercept, pid=33986, task_id=ac9a29ff5b1511ef8e87ef3b91191f4d, upstream=usc-word-edit.officeapps.live.com:443, no fake cert generated: failed to get fake upstream certificate
Aug 15 09:50:12.023 INFO auditor_name=default, daemon_name=, depth=1, intercept_type=TlsHandshake, log_type=Intercept, pid=33986, task_id=ac9a29ff5b1511ef8e88ef3b91191f4d, upstream=usc-word-edit.officeapps.live.com:443, no fake cert generated: failed to get fake upstream certificate
Aug 15 09:50:12.023 INFO c_rd_bytes=678, c_wr_bytes=0, client_addr=127.0.0.1:63759, daemon_name=, escaper=default, log_type=Task, next_bound_addr=192.168.50.94:63763, next_peer_addr=52.108.9.12:443, pid=33986, r_rd_bytes=6892, r_wr_bytes=675, ready_time=27.504ms, reason=InterceptionError, server_addr=127.0.0.1:4128, server_name=mitm, server_type=HttpProxy, stage=Relaying, start_at=2024-08-15T14:50:11.558343Z, task_id=ac9a29ff5b1511ef8e87ef3b91191f4d, task_type=TcpConnect, tcp_connect_spend=12.218ms, tcp_connect_tries=1, total_time=464.848ms, upstream=usc-word-edit.officeapps.live.com:443, wait_time=121.000µs, tls_modern interception error: tls: no fake cert generated: failed to get fake upstream certificate
Aug 15 09:50:12.023 INFO c_rd_bytes=678, c_wr_bytes=0, client_addr=127.0.0.1:63760, daemon_name=, escaper=default, log_type=Task, next_bound_addr=192.168.50.94:63762, next_peer_addr=52.108.9.12:443, pid=33986, r_rd_bytes=6892, r_wr_bytes=675, ready_time=11.138ms, reason=InterceptionError, server_addr=127.0.0.1:4128, server_name=mitm, server_type=HttpProxy, stage=Relaying, start_at=2024-08-15T14:50:11.572303Z, task_id=ac9a29ff5b1511ef8e88ef3b91191f4d, task_type=TcpConnect, tcp_connect_spend=9.988ms, tcp_connect_tries=1, total_time=450.889ms, upstream=usc-word-edit.officeapps.live.com:443, wait_time=102.291µs, tls_modern interception error: tls: no fake cert generated: failed to get fake upstream certificate
Aug 15 09:50:12.023 INFO auditor_name=default, daemon_name=, depth=1, intercept_type=TlsHandshake, log_type=Intercept, pid=33986, task_id=ac9a29ff5b1511ef8e86ef3b91191f4d, upstream=usc-word-edit.officeapps.live.com:443, no fake cert generated: failed to get fake upstream certificate
Can you try by setting query_wait_timeout: 4s
in tls_cert_generator
config?
Or you can add more working threads to g3fcgen by setting worker: {}
in it's main.yaml.
I set both and it still fails immediately. Doesn't seem to wait for timeout.
Aug 15 10:38:35.650 INFO auditor_name=default, daemon_name=, depth=1, intercept_type=TlsHandshake, log_type=Intercept, pid=34813, task_id=6eeccf7f5b1c11ef9d648a8e6879accb, upstream=www.google.com:443, no fake cert generated: failed to get fake upstream certificate
Aug 15 10:38:35.650 INFO c_rd_bytes=659, c_wr_bytes=0, client_addr=127.0.0.1:65345, daemon_name=, escaper=default, log_type=Task, next_bound_addr=192.168.50.94:65346, next_peer_addr=142.250.114.99:443, pid=34813, r_rd_bytes=4104, r_wr_bytes=581, ready_time=23.381ms, reason=InterceptionError, server_addr=127.0.0.1:4128, server_name=mitm, server_type=HttpProxy, stage=Relaying, start_at=2024-08-15T15:38:34.806586Z, task_id=6eeccf7f5b1c11ef9d648a8e6879accb, task_type=TcpConnect, tcp_connect_spend=9.081ms, tcp_connect_tries=1, total_time=844.278ms, upstream=www.google.com:443, wait_time=308.625µs, tls_modern interception error: tls: no fake cert generated: failed to get fake upstream certificate
Also as an aside - should I be using worker or runtime to define the thread numbers? In the example it had runtime. But seems that may not be right. I tester with both without success.
runtime:
thread_number: 2
I also tested with:
worker:
thread_number: 2
I set both and it still fails immediately. Doesn't seem to wait for timeout.
Aug 15 10:38:35.650 INFO auditor_name=default, daemon_name=, depth=1, intercept_type=TlsHandshake, log_type=Intercept, pid=34813, task_id=6eeccf7f5b1c11ef9d648a8e6879accb, upstream=www.google.com:443, no fake cert generated: failed to get fake upstream certificate Aug 15 10:38:35.650 INFO c_rd_bytes=659, c_wr_bytes=0, client_addr=127.0.0.1:65345, daemon_name=, escaper=default, log_type=Task, next_bound_addr=192.168.50.94:65346, next_peer_addr=142.250.114.99:443, pid=34813, r_rd_bytes=4104, r_wr_bytes=581, ready_time=23.381ms, reason=InterceptionError, server_addr=127.0.0.1:4128, server_name=mitm, server_type=HttpProxy, stage=Relaying, start_at=2024-08-15T15:38:34.806586Z, task_id=6eeccf7f5b1c11ef9d648a8e6879accb, task_type=TcpConnect, tcp_connect_spend=9.081ms, tcp_connect_tries=1, total_time=844.278ms, upstream=www.google.com:443, wait_time=308.625µs, tls_modern interception error: tls: no fake cert generated: failed to get fake upstream certificate
You can capture the traffic on udp port 2999 to check the response latency
Also as an aside - should I be using worker or runtime to define the thread numbers? In the example it had runtime. But seems that may not be right. I tester with both without success.
runtime: thread_number: 2
I also tested with:
worker: thread_number: 2
the first one will set the thread number for handle of udp traffic. the second one will set thread number for cert generation
Looks like it does successfully send it back within .16 seconds. G3proxy displays an error within .5 seconds after that.
Looks like if I set cache_request_timeout and query_wait_timeout it significantly reduces the frequency of the error. Though I do tend to see more "Connection reset by peer" errors when those are set.
query_wait_timeout: 4s
cache_request_timeout: 4s
Aug 15 12:46:03.079 INFO c_rd_bytes=0, c_wr_bytes=0, client_addr=127.0.0.1:60461, daemon_name=, escaper=default, log_type=Task, next_bound_addr=192.168.50.94:60485, next_peer_addr=23.198.7.12:443, pid=38637, r_rd_bytes=0, r_wr_bytes=0, ready_time=6.840ms, reason=ClientTcpReadFailed, server_addr=127.0.0.1:4128, server_name=mitm, server_type=HttpProxy, stage=Relaying, start_at=2024-08-15T17:46:03.072489Z, task_id=3e1595ff5b2e11ef9a65c3d4441663e6, task_type=TcpConnect, tcp_connect_spend=6.619ms, tcp_connect_tries=1, total_time=6.978ms, upstream=res-1.cdn.office.net:443, wait_time=92.375µs, tcp read from client: Os { code: 54, kind: ConnectionReset, message: "Connection reset by peer" }
I will add more trace level logs today to see if there are other non-timeout problems
Looks like if I set cache_request_timeout and query_wait_timeout it significantly reduces the frequency of the error. Though I do tend to see more "Connection reset by peer" errors when those are set.
query_wait_timeout: 4s cache_request_timeout: 4s
Aug 15 12:46:03.079 INFO c_rd_bytes=0, c_wr_bytes=0, client_addr=127.0.0.1:60461, daemon_name=, escaper=default, log_type=Task, next_bound_addr=192.168.50.94:60485, next_peer_addr=23.198.7.12:443, pid=38637, r_rd_bytes=0, r_wr_bytes=0, ready_time=6.840ms, reason=ClientTcpReadFailed, server_addr=127.0.0.1:4128, server_name=mitm, server_type=HttpProxy, stage=Relaying, start_at=2024-08-15T17:46:03.072489Z, task_id=3e1595ff5b2e11ef9a65c3d4441663e6, task_type=TcpConnect, tcp_connect_spend=6.619ms, tcp_connect_tries=1, total_time=6.978ms, upstream=res-1.cdn.office.net:443, wait_time=92.375µs, tcp read from client: Os { code: 54, kind: ConnectionReset, message: "Connection reset by peer" }
you can search the same task_id in the log to see in which protocol state the client closed the connection by sending a tcp reset
Thanks for adding additional debug info. It seems that g3fcgen is able to get the cert but g3proxy is having issues getting it from there. Could it be related to the cache? Here is the latest:
g3fcgen:
Aug 19 10:08:06.573 DEBG Worker#0 got certificate for host usc-word-edit.officeapps.live.com <g3fcgen::backend(mod.rs:166)>
Aug 19 10:08:06.573 DEBG usc-word-edit.officeapps.live.com - duration: 99526625ns, rsp size: 3854 <g3fcgen(lib.rs:139)>
Here is g3proxy related task_id
Aug 19 10:08:06.438 INFO auditor_name=default, daemon_name=, depth=1, log_type=Inspect, pid=36930, protocol=tls_modern, source=stream inspection, task_id=d701257f5e3c11ef89e79e4f7c53d357, ()
Aug 19 10:08:10.481 INFO auditor_name=default, daemon_name=, depth=1, intercept_type=TlsHandshake, log_type=Intercept, pid=36930, task_id=d701257f5e3c11ef89e79e4f7c53d357, upstream=usc-word-edit.officeapps.live.com:443, no fake cert generated: failed to get fake upstream certificate
Aug 19 10:08:10.500 INFO c_rd_bytes=681, c_wr_bytes=0, client_addr=127.0.0.1:51108, daemon_name=, escaper=default, log_type=Task, next_bound_addr=192.168.50.94:51110, next_peer_addr=52.108.8.12:443, pid=36930, r_rd_bytes=6928, r_wr_bytes=711, ready_time=15.552ms, reason=InterceptionError, server_addr=127.0.0.1:4128, server_name=http-proxy, server_type=HttpProxy, stage=Relaying, start_at=2024-08-19T15:08:06.422443Z, task_id=d701257f5e3c11ef89e79e4f7c53d357, task_type=TcpConnect, tcp_connect_spend=9.545ms, tcp_connect_tries=1, total_time=4.059s, upstream=usc-word-edit.officeapps.live.com:443, wait_time=113.500µs, tls_modern interception error: tls: no fake cert generated: failed to get fake upstream certificate
Do you see any cert generation related logs between Aug 19 10:08:06.438
and Aug 19 10:08:10.481
?
I was able to reproduce on a different site. Looks like I missed a debug log
Aug 19 11:36:18.433 DEBG cert generation query timeout for b1t-chidc2.zemanta.com <g3_cert_agent::query(query.rs:146)>
Aug 19 11:36:18.433 INFO auditor_name=default, daemon_name=, depth=1, intercept_type=TlsHandshake, log_type=Intercept, pid=40017, task_id=26e5e97f5e4911efb8f89b435b7523e6, upstream=b1t-chidc2.zemanta.com:443, no fake cert generated: failed to get fake upstream certificate
Aug 19 11:36:14.762 DEBG Worker#1 got certificate for host b1t-chidc2.zemanta.com <g3fcgen::backend(mod.rs:166)>
Aug 19 11:36:14.763 DEBG b1t-chidc2.zemanta.com - duration: 334989000ns, rsp size: 2969 <g3fcgen(lib.rs:139)>
01072f6a117f050d2cfaf15a17b565adb1ffd2b0 this commit should fix the problem
Looks good! Thanks!
I have noticed for certain sites there are issues getting the upstream cert/the browser does not connect. If I let it sit for a minute then refresh it works.
I am not sure the exact reason for this. But a potential workaround would be to fallback to generating a basic cert just with the SNI.