Open tschmidtb51 opened 1 year ago
Trying to access the pmd (e.g. via curl https://www.tibco.com/.well-known/csaf/provider-metadata.json
) returns a 403 (Forbidden). If the checker is not allowed to access the provider-metadata.json it can't evaluate its contents, thus failing.
Thanks for the hint. I emailed them.
You will likely encounter this on a number of sites as Akamai does not allow the default user agent used by curl. See here.
When using curl to access files that may be hosted by Akamai, it is recommended that you use an empty user agent:
curl -A '' https://www.tibco.com/.well-known/csaf/provider-metadata.json
To my knowledge, there is no disadvantage to using an empty user agent so I would recommend using it for all sites.
@tibcodenny Thanks for the hint.
Given that information, I think we should make the user agent configurable and default to an empty string.
You will likely encounter this on a number of sites as Akamai does not allow the default user agent used by curl.
If this is the case then in my view https://www.tibco.com/.well-known/csaf/provider-metadata.json is not conforming to the CSAF standard as it is not browsable by usual web browsers. The idea of CSAF is to spread the information as wide as possible, and requiring some special User-Agent headers to make this possible goes against this. It seems it should be fixed on tibco's side improving their over aggresive web application firewall settings.
I think we should make the user agent configurable and default to an empty string.
For reference checker and downloader it makes sense to be strict about standards, so I'd recommend to not add a feature as this point.
Akamai is one of the largest CDNs in the world. I do not have insight into why Akamai chose to specifically disallow curl, but they do and it is a well known thing. Suggesting that we should somehow stand our ground against this, when there is such an obvious and trivial solution, is simply not practical.
Akamai is one of the largest CDNs in the world. I do not have insight into why Akamai chose to specifically disallow curl, but they do and it is a well known thing. Suggesting that we should somehow stand our ground against this, when there is such an obvious and trivial solution, is simply not practical.
It is not a simple solution to demand from all CSAF clients to do a special thing beside the HTTP standards to cater to a defect of Akamai. Then you would need to make it a requirement in the standard. Subsequently all other defects would be subject to inclusion in the standard on the same grounds, too.
It is my understanding that the goal of CSAF is to make the advisories as widely available as possible, this includes script using libcurl or curl. Adding requirements for user-agent strings goes contrary to CSAF mission in my view.
I do not believe you can fairly refer to this as a "defect" on Akamai's part. The user-agent is "used by servers to help identify the scope of reported interoperability problems, to work around or tailor responses to avoid particular user agent limitations, and for analytics regarding browser or operating system use." In short, the user-agent is supplemental information outside the standard agreement between the requestor and the server, and there is no behavioral requirement of the server for any user-agent string. Sending a user-agent provides no value whatsoever in retrieving a text file--it's just a waste of bandwidth.
Regardless, this is far outside the context of CSAF. Just remove the user-agent from the request and move on.
@dennypage, @bernhardreiter: Thank you for your contributions and the discussion.
I understand both points - to summarize:
As you can see - I gave it a longer thought. I also discussed it with colleagues, did some testing and read the standard again. I came to the conclusion that the WAF filtering based on the user-agent is not in conformance with the standard for the following reasons:
python-requests/2.28.1
, csaf_downloader/2.1.0
, go-package-http/1.5
As a result: a WAF that prevents the access of resources by automatic retrieval tools that the standard was designed to support and had in mind that they could be used seems not in conformance with the standard to me.
As the csaf_checker
is a tool to check conformance, such a change would undermine its use as it would not longer detect this mistake (and as a result organizations providing CSAF files with such a WAF can't fix the mistake).
This brings me to the decision, that the csaf_checker
MUST NOT change its default user-agent.
The standard refers in section 7.1.6 to tools that have non-empty user-agent by default.
This section does not refer to the user-agent in any way. It prohibits redirects because some HTTP clients do not support them. It uses curl as an example as an example of a tool that does not support redirect, but it does not discuss the manner in which curl is invoked. In short, the CSAF standard makes no actual statement on the HTTP client or the user-agent.
The standard requires in section 7.1.4 that CSAF documents with TLP:WHITE "MUST be freely accessible". IMHO, this is violated if you are required to have a specific user-agent. The same applies if you could choose between different elements of a closed set of user-agents.
I disagree with this conclusion. The "reasonable person" standard needs to be applied here. I don't believe that Akamai's prohibition on curl is equivalent to a statement that the information hosted there is not freely available. Anyone that attempts to look at the information in a browser will be able see it. The reasonable person's perception will be that the tool is broken, not that the information is "not freely available."
Overall, I think the idea of requiring curl's default user agent to be accepted is really self-defeating. It is very unlikely that Akamai will change their policy just to make curl work for CSAF. And it is also very unlikely that companies are going to stop using Akamai for hosting. By insisting on the default user-agent of curl, we are needlessly creating a deadlock which will hamper adoption of CSAF.
In short, the CSAF standard makes no actual statement on the [...] the user-agent.
I agree 100% with that statement. Therefore, I tried to explain how I came to the conclusion. Instead of following the standard to letter (as we agreed there is no statement about the user-agent in the standard), I looked at the spirit of and intentions behind the standard (automation and easy to retrieve). Those led me to the conclusion stated above.
Overall, I think the idea of requiring curl's default user agent to be accepted is really self-defeating.
According to my test, not only the curl user-agent but also others (like python's requests or Go-http-client/1.1
(this is actually used by the csaf_checker
) and random strings get blocked. This lead me to the conclusion that only a certain set of user-agents is allowed.
By insisting on the default user-agent of curl, we are needlessly creating a deadlock which will hamper adoption of CSAF.
It is an Open Source Project - so I don't think that this is true. People can implement their own tools and handle it differently.
As discussed in today's TC meeting, this needs to be resolved in the standard. Therefore, I opened an issue in the TC's repo: https://github.com/oasis-tcs/csaf/issues/635
This implementation won't change it's behavior for now.
Let me add one argument (for better consideration):
RFC 9110 in Section 10.1.5. writes about User-Agent https://datatracker.ietf.org/doc/html/rfc9110#name-user-agent
which is often used by servers to help identify the scope of reported interoperability problems, to work around or tailor responses to avoid particular user agent limitations, and for analytics regarding browser or operating system use.
A user agent SHOULD send a User-Agent header field in each request
As CSAF mandates standard behaviour, sending a User-Agent is good practice. And RFC9110 does not say that the server may filter per user-agent, with the exception to avoiding user agent limitations. Libcurl or other HTTP libraries do not have a limitation that would justify not sending the CSAF contents to them.
This implementation won't change it's behavior for now.
If that is decided shall the issue be closed?
The
csaf_checker
fails ontibco.com
andwww.tibco.com
, but they provide a PMD athttps://www.tibco.com/.well-known/csaf/provider-metadata.json
. We need to investigate why.