Open htuch opened 4 years ago
+1 for removal. I was against adding in the first place.
+1 for removal
Yes, OpenCensus uses curl in its zipkin exporter (to push traces to zipkin).
Having an indirect networking API instead of curl in libraries would also help compiling them to Wasm binaries for sandboxing. The API would be implemented via Wasm ABI.
@g-easy would it be possible to modify the Zipkin exporter to work with some more generic HTTP interface that Envoy can supply?
As per https://github.com/envoyproxy/envoy/issues/9958 at some point OpenCensus should be replaced by the new OpenTracing + OpenCensus project called OpenTelemetry. One would hope that OpenTelemetry doesn't require curl
For opencensus, since the "issue" is with the Zipkin span exporter implementation (opencensus_exporter_zipkin
). I think we could own Envoy-specific implementation of that in our tree (I can try to own it if it is desired)? At minimum:
class ZipkinSpanExporterHandler : public ::opencensus::trace::exporter::SpanExporter::Handler,
public Http::AsyncClient::Callbacks {
...
};
Which basically "retrieves" the spans stored by opencensus-cpp
worker thread implementation and send that to Envoy's thread which has access to a dispatcher (I think server's dispatcher is OK to use here). As a "bonus", we also need to serialize the opencensus::trace::exporter::SpanData
to Zipkin's v2 via ProtobufWkt::Struct
(instead of rapidjson employed by the current opencensus-cpp
Zipkin span exporter implementation).
However, this will "force" the configuration to require a defined collector cluster to let Envoy's async-client to work (I think this is also relevant to the approach for the "aws" utility). Even the dynamic forward proxy requires a cluster definition. I wonder if we can/are allowed to somehow construct a cluster without defining in the config, but build it on the fly when it is required (https://github.com/envoyproxy/envoy/pull/3479 cc. @ramaraochavali).
@moderation, when we are needed to "upgrade" to OpenTelemetry, if the approach is the same (i.e. data is gathered in worker thread managed by the "tracer" library) and we want to support Zipkin exporter, we still need to provide an Envoy's async-client-friendly implementation for that.
One would hope that OpenTelemetry doesn't require curl
This is tracked in https://github.com/open-telemetry/opentelemetry-cpp/issues/6
https://github.com/envoyproxy/envoy/issues/11816#issuecomment-653838495 sounds pretty great to me.
Note that opencensus's existing background thread calls into Handler::Export() periodically.
@mattklein123 what is the best way of resolving URI for async client calls without explicitly defining a cluster for it? Is https://github.com/envoyproxy/envoy/pull/3479 the way to do it? Need your advice on this. Thank you!
@dio yeah we would probably need to add new functionality to AsyncClient to optionally expose the dynamic forward proxy functionality.
PR13063 is another example, which picks up the fix of
CVE-2020-8231: libcurl: wrong connect-only connection
As Envoy does NOT use the affected CURLOPT_CONNECT_ONLY toggle, this change simply satisfies overly-simplistic audits.
As part of my EnvoyCon talk, I looked at all CVEs for our dependencies since 2018. Curl is by far the worst here, with 14 CVEs and 23 releases since 2018. Further evidence that we really want to make some progress here.
Bumping priority given the high CVE rate (almost always false positives as well).
OpenTelemetry went 1.0.0 is Feb - https://medium.com/opentelemetry/opentelemetry-specification-v1-0-0-tracing-edition-72dd08936978. Looks the Zipkin exporter still needs Curl
I was thinking we could deprecate OpenCensus in favor of OT but if Curl is still present it doesn't look like it will help us. Hopefully the OT experts can provide some more context
You can assign this to me. It looks like it's mostly the AWS filters.
I was looking at this over the weekend and it's not as trivial as I first thought to rip this out of the AWS filters. To use the HTTP async client, we need a target cluster instead of just an IP like we have now with libcurl. This means that the cluster configs for users of the filters are going to need to include a target cluster for the metadata fetches similar to what the jwt authn filter does with the example_jwks_cluster
.
I can do it, but it'll change how the AWS filters get configured and cause failures with the old configs. @abaptiste do you (or other AppMesh folks) have an opinion here?
Is there a way to use dynamic forward clusters to avoid having to change the API surface?
Is there a way to use dynamic forward clusters to avoid having to change the API surface?
It looks like we're still going to need to have additional configuration, but it's for the DNS cache to use and dynamic forward proxy cluster config.
The frustrating part here is that these filters are reading full URIs out of env variables and feeding them into curl. Otherwise, we'd have been able to just update the cluster manager with static cluster configs and just use the HTTP async client like everywhere else.
It's not clear to me why these filters are relying on environment variables instead of the Envoy config.
Its because http[s]_proxy vars are a very well accepted convention, and it makes the stock envoy config more mutable. That doesn't make it ideal, but you raised the question why, and I think we can and should revisit the earlier design decisions
@tonya11en are you still working on this?
@zirain no, this turned out to be more work than I was expecting to do in my free time :(. Feel free to pick it up and I'll happily help review.
@lavignes as the other AWS filter owner, would you be able to take a look? If not we may need to bump the aws filter to contrib/
If we decide to move it to contrib due to not able to provide fix at the right time, I can take up the action item to move it to contrib.
Awesome, thank you!
We are planning to move ahead with removing libcurl dependency from AWS extensions. I have put a small doc to indicate the approach and consequences.
https://docs.google.com/document/d/1MxREfLX4BONhOCNr7dvHwdr1_kArLb3OOSvhb-b02oQ
I look forward to all of your opinions on this. I am mainly seeing your thoughts on
Option #2, don’t need explicitly defined cluster config
) mentioned in the doc is a workable solution. Is it still in process?
Yes, working on this. At testing stage now. I will start with couple of PRs in next few weeks. Sorry for the delay.
Sorry to bug - do you think it will land? I've ended up here via detective work trying to find out why IRSA wasn't working as I expected! (... and I've learnt a lot about aws in the process)
I am still working on the changes. I haven't got enough time to dedicate to this task.
In the current state I am able to make http calls to fetch the credentials but don't know how it can be fetched in main thread and updated on all the worker thread. https://github.com/envoyproxy/envoy/compare/main...suniltheta:envoy:cm_ref_2
Also, the change is no where close to the version where I would be able to raise the PR.
If you want to make IRSA work, then I can point you towards a patch https://github.com/envoyproxy/envoy/pull/23408. But unfortunate that still uses curl.
Thank you so much for the quick response. I would offer to help but this is miles away from my knowledge area...
I'm sure you thought of this, but just for my own knwoledge - since the lambda filter is still beta and already uses curl, is there a downside of incorporating the patch in the interim, while libcurl is being expunged? The patch didn't seem to be increasing the problem, but my reading of the patch is pretty basic
Made some progress. Still working on improving the unit testing. I will be raise the PRs shortly after finalizing the PR breakup.
/assign
thanks for the update! if you're able to publish an image somewhere i can get at from a helm chart then can give it a spin
@suniltheta Any update on this?
Currently in progress in https://github.com/envoyproxy/envoy/pull/29880.
@ceastman-r7 are you using any particular filter?
This was more of a concern of vulnerability scans flagging curl. for example:
CVE-2023-38545 | curl:7.81.0-1ubuntu1.13 | HIGH | SOCKS5 heap buffer overflow |
---|
CVE-2023-38545 curl:7.81.0-1ubuntu1.13 HIGH SOCKS5 heap buffer overflow
which I was removing curl from the istio-pilot operating system image but another istio developer questioned if that mattered since curl was built into envoy.
... questioned if that mattered since curl was built into envoy.
it doesnt matter to envoy - as said its compiled in
not sure where curl is coming from - we dont seem to have it any of our images
my advice would be to use the distroless base so you dont have to worry about extraneous deps, not sure how istio pilot is built - for some reason i thought they did use a distroless base
Currently we depend on
libcurl
for URL fetching in AWS extension common utils https://github.com/envoyproxy/envoy/blob/89d6c6c6fa202aa7a01e85d2d887b56c8a3268ad/source/extensions/common/aws/utility.cc#L97 and I think OpenCensus (@g-easy can you confirm?)The use of curl is largely redundant, since Envoy itself can do HTTP fetch. In addition, Curl does not have a compatible threading and observability model with Envoy. The recent disclosures of CVE-2020-8169 and CVE-2020-8177 provide an example of why we should eliminate this from our trusted compute base.
Opening this tracking ticket to discuss further whether we can remove this dep.