comunica / comunica-feature-link-traversal

📬 Comunica packages for link traversal-based query execution
Other
8 stars 11 forks source link

Avoid mixed content link traversal when using the query in the browser via https #59

Closed Maximvdw closed 2 years ago

Maximvdw commented 2 years ago

Problem When using the link traversal in the browser via insecure http - no issues exist. When using https you will get mixed content errors when the engine tries to fetch an http link.

Eventually this could lead to the query not functioning as expected with many links not being visited Something that can easily be tested here: https://comunica.github.io/comunica-feature-link-traversal-web-clients/builds/default

Proposed solution

  1. Check if we are using a browser (globalThis.window not being undefined) globalThis is used for compatibility with jest and NodeJS as well as the browser
  2. Check if the location URL is present (globalThis.window.location not being undefined)
  3. Check if the location URL uses https: there is a window.isSecureContext but it might not be supported by all browsers
  4. If yes, check if the fileLink.url uses http:
  5. If yes, replace it to https: to avoid mixed content errors

It could be that the remote server does not offer the fileLink.url in https - but it is a better attempt to try and visit it via https than to let the browser throw an error. Example: http://qudt.org/vocab/unit/M VS https://qudt.org/vocab/unit/M The latter will not work - but on HTTPS, the first one won't work either.

Current workarounds Current workarounds would be to use an HTTPS proxy that does not care about mixed content e.g. https://proxy.linkeddatafragments.org/

Changes

The following commit has been tested with NodeJS and in the browser where the issue is present.

Maximvdw commented 2 years ago

The current issue is that in the case of proxies - it will still try to change it to https. Will see what can be done about that (i.e. disable the fix for proxies)

Example: https://proxy.linkeddatafragments.org/http://qudt.org/vocab/unit/M works and will not throw a mixed-content error

https://proxy.linkeddatafragments.org/https://qudt.org/vocab/unit/M does not work

EDIT: fixed in https://github.com/comunica/comunica-feature-link-traversal/pull/59/commits/58df234ed82a1e181ce948d40704f373b371bfb0

rubensworks commented 2 years ago

Thanks for the PR @Maximvdw!

It's unfortunate that a fix like this is needed, but the mixed content errors are out of our control, so I agree that the workaround is valid.

However, we should be aware that strictly speaking a http resource is not the same as a https resource (it may refer to a different page). So I would suggest adding an option to the actor that allows this behaviour to be disabled, just in case people run into problems because of it. E.g. something like liftToHttps, that defaults to true. (can be done like this: https://github.com/comunica/comunica/blob/master/packages/actor-rdf-resolve-quad-pattern-federated/lib/ActorRdfResolveQuadPatternFederated.ts#L50-L55)

Edit: I see that the CI is showing some errors.

Maximvdw commented 2 years ago

I will add the option and check the ci errors later today

Maximvdw commented 2 years ago

Ok, the new commit (https://github.com/comunica/comunica-feature-link-traversal/commit/63ef39a04dd8f3d44c15e4c10a812b6fe982dd57) adds an upgradeInsecureRequests flag (seemed a bit more descriptive than liftToHttps) which will override the default behaviour of using the browser context and proxy mentioned in my original post.

So if someone wants to upgrade insecure requests from an insecure browser context they can do so - if they do not want to upgrade insecure requests from a secure context they can turn the flag to false. If they do not provide the setting it will only upgrade insecure requests when using https in the browser AND when also not using a proxy.

Note that I did not default it to true in the jsdoc. The behaviour of the flag is dependant on the environment where the actor is used. There are use cases where an upgrade to https would be favourable in an insecure context (e.g. when using a proxy that requires secure context perhaps). The current technique (in my opinion) should only require developers to set the flag when absolutely required.

PS: Sorry about the linting issues - they should be fixed now

rubensworks commented 2 years ago

Excellent work @Maximvdw, thanks for this!

And indeed, upgradeInsecureRequests is much clearer than what I suggested 😅

Publishing alpha.4.0 now...