`csaf_checker` fails on `tibco.com`

tschmidtb51 commented 1 year ago

The csaf_checker fails on tibco.com and www.tibco.com, but they provide a PMD at https://www.tibco.com/.well-known/csaf/provider-metadata.json. We need to investigate why.

JanHoefelmeyer commented 1 year ago

Trying to access the pmd (e.g. via curl https://www.tibco.com/.well-known/csaf/provider-metadata.json) returns a 403 (Forbidden). If the checker is not allowed to access the provider-metadata.json it can't evaluate its contents, thus failing.

tschmidtb51 commented 1 year ago

Thanks for the hint. I emailed them.

tibcodenny commented 1 year ago

You will likely encounter this on a number of sites as Akamai does not allow the default user agent used by curl. See here.

When using curl to access files that may be hosted by Akamai, it is recommended that you use an empty user agent:

curl -A '' https://www.tibco.com/.well-known/csaf/provider-metadata.json

To my knowledge, there is no disadvantage to using an empty user agent so I would recommend using it for all sites.

tschmidtb51 commented 1 year ago

@tibcodenny Thanks for the hint.

tschmidtb51 commented 1 year ago

Given that information, I think we should make the user agent configurable and default to an empty string.

bernhardreiter commented 1 year ago

You will likely encounter this on a number of sites as Akamai does not allow the default user agent used by curl.

If this is the case then in my view https://www.tibco.com/.well-known/csaf/provider-metadata.json is not conforming to the CSAF standard as it is not browsable by usual web browsers. The idea of CSAF is to spread the information as wide as possible, and requiring some special User-Agent headers to make this possible goes against this. It seems it should be fixed on tibco's side improving their over aggresive web application firewall settings.

I think we should make the user agent configurable and default to an empty string.

For reference checker and downloader it makes sense to be strict about standards, so I'd recommend to not add a feature as this point.

dennypage commented 1 year ago

Akamai is one of the largest CDNs in the world. I do not have insight into why Akamai chose to specifically disallow curl, but they do and it is a well known thing. Suggesting that we should somehow stand our ground against this, when there is such an obvious and trivial solution, is simply not practical.

bernhardreiter commented 1 year ago

Akamai is one of the largest CDNs in the world. I do not have insight into why Akamai chose to specifically disallow curl, but they do and it is a well known thing. Suggesting that we should somehow stand our ground against this, when there is such an obvious and trivial solution, is simply not practical.

It is not a simple solution to demand from all CSAF clients to do a special thing beside the HTTP standards to cater to a defect of Akamai. Then you would need to make it a requirement in the standard. Subsequently all other defects would be subject to inclusion in the standard on the same grounds, too.

It is my understanding that the goal of CSAF is to make the advisories as widely available as possible, this includes script using libcurl or curl. Adding requirements for user-agent strings goes contrary to CSAF mission in my view.

dennypage commented 1 year ago

I do not believe you can fairly refer to this as a "defect" on Akamai's part. The user-agent is "used by servers to help identify the scope of reported interoperability problems, to work around or tailor responses to avoid particular user agent limitations, and for analytics regarding browser or operating system use." In short, the user-agent is supplemental information outside the standard agreement between the requestor and the server, and there is no behavioral requirement of the server for any user-agent string. Sending a user-agent provides no value whatsoever in retrieving a text file--it's just a waste of bandwidth.

Regardless, this is far outside the context of CSAF. Just remove the user-agent from the request and move on.

tschmidtb51 commented 1 year ago

@dennypage, @bernhardreiter: Thank you for your contributions and the discussion.

I understand both points - to summarize:

@dennypage: Changing the user-agent is easy and the practical solution.
@bernhardreiter: Changing the user-agent is not explicit required by the standard and therefore should not be done as this is the reference implementation.

As you can see - I gave it a longer thought. I also discussed it with colleagues, did some testing and read the standard again. I came to the conclusion that the WAF filtering based on the user-agent is not in conformance with the standard for the following reasons:

The WAF accepts (based on the fraction that I tested) only user-agents of browsers. Scripts or tools (e.g. python-requests/2.28.1, csaf_downloader/2.1.0, go-package-http/1.5
The standard does not require to use a specific user-agent. Without that being specified, the default behavior should apply.
The standard refers in section 7.1.6 to tools that have non-empty user-agent by default.
The standard requires in section 7.1.4 that CSAF documents with TLP:WHITE "MUST be freely accessible". IMHO, this is violated if you are required to have a specific user-agent. The same applies if you could choose between different elements of a closed set of user-agents.
The standard was build with the automatic retrieval process as an important part of it. The roles " CSAF (trusted) provider" and the CSAF aggregators have specific requirements to support the automatic retrieval.

As a result: a WAF that prevents the access of resources by automatic retrieval tools that the standard was designed to support and had in mind that they could be used seems not in conformance with the standard to me.

As the csaf_checker is a tool to check conformance, such a change would undermine its use as it would not longer detect this mistake (and as a result organizations providing CSAF files with such a WAF can't fix the mistake).

tschmidtb51 commented 1 year ago

This brings me to the decision, that the csaf_checker MUST NOT change its default user-agent.

dennypage commented 1 year ago

The standard refers in section 7.1.6 to tools that have non-empty user-agent by default.

This section does not refer to the user-agent in any way. It prohibits redirects because some HTTP clients do not support them. It uses curl as an example as an example of a tool that does not support redirect, but it does not discuss the manner in which curl is invoked. In short, the CSAF standard makes no actual statement on the HTTP client or the user-agent.

The standard requires in section 7.1.4 that CSAF documents with TLP:WHITE "MUST be freely accessible". IMHO, this is violated if you are required to have a specific user-agent. The same applies if you could choose between different elements of a closed set of user-agents.

I disagree with this conclusion. The "reasonable person" standard needs to be applied here. I don't believe that Akamai's prohibition on curl is equivalent to a statement that the information hosted there is not freely available. Anyone that attempts to look at the information in a browser will be able see it. The reasonable person's perception will be that the tool is broken, not that the information is "not freely available."

Overall, I think the idea of requiring curl's default user agent to be accepted is really self-defeating. It is very unlikely that Akamai will change their policy just to make curl work for CSAF. And it is also very unlikely that companies are going to stop using Akamai for hosting. By insisting on the default user-agent of curl, we are needlessly creating a deadlock which will hamper adoption of CSAF.

tschmidtb51 commented 1 year ago

In short, the CSAF standard makes no actual statement on the [...] the user-agent.

I agree 100% with that statement. Therefore, I tried to explain how I came to the conclusion. Instead of following the standard to letter (as we agreed there is no statement about the user-agent in the standard), I looked at the spirit of and intentions behind the standard (automation and easy to retrieve). Those led me to the conclusion stated above.

Overall, I think the idea of requiring curl's default user agent to be accepted is really self-defeating.

According to my test, not only the curl user-agent but also others (like python's requests or Go-http-client/1.1 (this is actually used by the csaf_checker) and random strings get blocked. This lead me to the conclusion that only a certain set of user-agents is allowed.

By insisting on the default user-agent of curl, we are needlessly creating a deadlock which will hamper adoption of CSAF.

It is an Open Source Project - so I don't think that this is true. People can implement their own tools and handle it differently.

tschmidtb51 commented 1 year ago

As discussed in today's TC meeting, this needs to be resolved in the standard. Therefore, I opened an issue in the TC's repo: https://github.com/oasis-tcs/csaf/issues/635

tschmidtb51 commented 1 year ago

This implementation won't change it's behavior for now.

bernhardreiter commented 1 year ago

Let me add one argument (for better consideration):

HTTP standard on user-agent

RFC 9110 in Section 10.1.5. writes about User-Agent https://datatracker.ietf.org/doc/html/rfc9110#name-user-agent

which is often used by servers to help identify the scope of reported interoperability problems, to work around or tailor responses to avoid particular user agent limitations, and for analytics regarding browser or operating system use.

A user agent SHOULD send a User-Agent header field in each request

As CSAF mandates standard behaviour, sending a User-Agent is good practice. And RFC9110 does not say that the server may filter per user-agent, with the exception to avoiding user agent limitations. Libcurl or other HTTP libraries do not have a limitation that would justify not sending the CSAF contents to them.

bernhardreiter commented 1 year ago

This implementation won't change it's behavior for now.

If that is decided shall the issue be closed?

gocsaf / csaf

`csaf_checker` fails on `tibco.com` #376

HTTP standard on user-agent