UKGovLD / registry-core

Linked data registry - core application and example UI
https://github.com/UKGovLD/registry-core/wiki
Apache License 2.0
23 stars 9 forks source link

Register table URI encoding #185

Open mike-gormley opened 1 year ago

mike-gormley commented 1 year ago

Hello,

It's taken me a few weeks to come back to this but following on from https://github.com/UKGovLD/registry-core/issues/181 we've encountered a further issue. Seems I only did affirmative testing against the previous issues raised rather than further regression testing.

We've found that the link in the name column of the registers is 404'ing due to the URL encoding of the register/entity path. For example, https://mywebsite.com/_my-register is becoming https://mywebsite.com/%2F_my-register, or https://mywebsite.com/my-register/_sub-register is becoming https://mywebsite.com/%2Fmy-register%2Fsub-register

I'm uncertain if this is something caused by our template customisations or whether this occurs in the standard templates. Could anyone confirm that version 2.3.14 is working (or not working) as expected?

der commented 1 year ago

Hi Mike,

You are right, there we're knock on changes needed which we failed to spot. The linked PR works locally, can you verify?

In passing I see that the changes for #181 involved a typo but inclined to leave that to a future clean up.

mike-gormley commented 1 year ago

Hi @der, thank you for looking into this. I've deployed the changes to a development environment and it resolves the register table links. However, it reverses the original XSS issue in #181.

We've also learned we can add a URL to the Login modal close button return querystring parameter and it is even possible to add a URL e.g. /ui/login?return=https://www.google.co.uk, so not just javascript.

We've also found another location at ./?entity=/Some-text-here.

der commented 1 year ago

Hi @der, thank you for looking into this. I've deployed the changes to a development environment and it resolves the register table links. However, it reverses the original XSS issue in https://github.com/UKGovLD/registry-core/issues/181.

Sorry about that, that was careless. I was sure I had tested that case.

The regression should be reversed by UKGovLD/registry-config-base#37

We've also learned we can add a URL to the Login modal close button return querystring parameter is even possible to add a URL e.g. /ui/login?return=https://www.google.co.uk, so not just javascript.

Switching back to the correct xss clean for the close button case should have addressed this as well.

We've also found another location at ./?entity=/Some-text-here.

That was an interesting case. It's supposed to be able show what it knows about external entities but the underlying check was returning empty information instead of an explicit 404 so the render chosen was incorrect. That should be fixed too.

Please let me know if these changes work for you.

mike-gormley commented 1 year ago

I think we're nearly there. I've deployed another instance today with the changes in https://github.com/UKGovLD/registry-config-base/pull/37 and been testing what I can. On the whole, looks like it's handling many of the scenarios previously mentioned as well as it can though I think the entity param issue resolution is at odds with the intended purpose as you refer to.

?entity=http://www.mybadwebsite.com now shows a 'Not found (404)', which is want we want.

Conversely, the intended use of this query parameter was to also cater for external definition entities such as /?entity=http://www.w3.org/2004/02/skos/core%23Concept now doesn't work as intended. Use of this in the templates (e.g. The type in the metadata pane of _item-render.vm might get around this by directly linking to those known external URIs (as it does elsewhere) but you might know of other ways to address this.

Unfortunately, I have found that the 404 page also suffers from the same problem as what we are trying to resolve. It will display the invalid URL in the page as a clickable link, which can actually be valid in this format!

image

We could fix this in the _not-found.vm template ourselves by removing the parroting of the invalid URL as an interactive link (or preferably not mentioned at all) but would be helpful if we didn't need to maintain a divergence in functionality in our own instances.

The issue is compounded by a separate concern in that 404 pages actually return HTTP 200 OK responses so has potential to be treated as valid pages of our websites and indexed by web bots (that link to web pages outside our control).

marqh commented 1 year ago

The issue is compounded by a separate concern in that 404 pages actually return HTTP 200 OK response

this seems quite crucial to me i'd expect to receive an HTTP 404 from such a page, as well as a rendered not found page

where is the HTTP response code managed in these cases?

would it be possible to include the correct HTTP response code to ensure that the ?entity pattern can be coded against, using HTTP codes to differentiate exists from does-not-exist

many thanks, mark

der commented 1 year ago

The ?entity queries at the UI level were supposed to be answering "what do you know about X" and a 200 response saying "I don't know anything about X" seems entirely the correct response. This is quite different from retrieving a URI within the registry namespace and getting a 404 if it doesn't exist.

der commented 1 year ago

Conversely, the intended use of this query parameter was to also cater for external definition entities such as /?entity=http://www.w3.org/2004/02/skos/core%23Concept now doesn't work as intended.

Well, actually your registry probably doesn't have any registered item for http://www.w3.org/2004/02/skos/core#Concept so it is now returning the "correct" response, it's just not a helpful one!

I think our issue is that the templates which render external entities are using the ?entity queries just in case there is additional information on them within the registry. I'm not sure there's a way we can do that without having this exposure. So the answer may be to stop generating any of these ?entity queries in the first place and simply link directly to the external URL. While leaving the entity queries in place to allowing anyone to check for registered external entities via this route if required.

Not sure how big a job that is but can try to take a look this evening.

Unfortunately, I have found that the 404 page also suffers from the same problem as what we are trying to resolve. It will display the invalid URL in the page as a clickable link, which can actually be valid in this format!

Yes, I wondered about that. While it's sanitised in the sense that it should be blocking e.g. javascript it does render a link. If that's not desirable, and I can see the argument against it, then can remove the href from the 404 template.

der commented 1 year ago

I've updated the PR with a proposed and fairly simple resolution.

This removes the use of ?entity calls internally. Now any external URI reference from a registered entry links directly to that external entity and there is no reliance on the ?entity UI functionality for these.

Secondly it removes the processing of the ?entity query parameter entirely from the UI. So a query to /entity=http://www.mybadwebsite.com is treated as simply a query to /. This is similar to how sites like https://www.google.com handle spurious query parameters.

Apart from the now-removed internal use, the only purpose of the ?entity query could be to search for external registered entries. At the UI level that is more conveniently done by using the check operation under the advanced menu which already handles non-registered entities without rendering a link to them.

This brute force removal of the functionality seems safer than relying on correct detection of unregistered entities, however we report them.

Please let me know if this is an acceptable solution.

der commented 1 year ago

I've also taken a pass through the code to identify other potential issues in the admin pages and added additional guards for those cases.

der commented 1 year ago

Many of the additional issues discovered occur in admin pages. I've also disabled all admin page templates when not authorized and taken a further pass over the partials that are used in such pages since it seems that if an attacker has access to the code they could try to invoke those directly.

This pass through the code is complete enough for review. I'm hopeful that we've now mitigated these vulnerabilities but would be grateful for any further checking you can do.

mike-gormley commented 1 year ago

@der You've been busy the last 24 hours! I really appreciate the broader approach to locating other potential loopholes in the templates.

Just to place an update here from my perspective at the end of progress today. I'd deployed the main chunk of last night's/this morning's commits to a local dev environment earlier today and you've seen I've added one or two comments on the more recent proposed changes already. The complete removal of the ?entity= is a welcome proposal, as are the additional validations in place, and the removal of the link on the Not Found page. Previous vulnerabilities appear to be addressed.

I'm still regression testing as much of the registry capabilities as I can find in slower time. The only remaining issue I've seen is the Close button on the login modal and the create-redirect modal forms both have stopped working as the URIs are encoded. I'm less fussed about this in the grand scheme of things and could be split out into a separate bugfix. The redirects for actually logging in/registering still function as intended and still cater for misbehaviour so not all bad here.

I'll do another deploy of all commits tomorrow to test the newest commits, and hopefully we can promote these changes. Thanks again.

sgrellet commented 1 year ago

"Please let me know if this is an acceptable solution."

Is the UseCase where you locally extend an external entity handled in that proposal ?

For example, here, we extended INSPIRE 'status' register (https://data.geoscience.fr/ncl/status). Would that solution prevent from accessing our 'local' extensions ?

der commented 1 year ago

I'm still regression testing as much of the registry capabilities as I can find in slower time. The only remaining issue I've seen is the Close button on the login modal and the create-redirect modal forms both have stopped working as the URIs are encoded. I'm less fussed about this in the grand scheme of things and could be split out into a separate bugfix. The redirects for actually logging in/registering still function as intended and still cater for misbehaviour so not all bad here.

Thanks, I was afraid that might be a risk though I thought I'd tested that. As you say, something we can live with for now. We might have to a different style of cleaning for these return links (e.g. a regex to limit to a safe relative URI).

der commented 1 year ago

HI @sgrellet

Glad you noticed this thread, I was about to contact you to make sure you were aware.

Is the UseCase where you locally extend an external entity handled in that proposal ?

For example, here, we extended INSPIRE 'status' register (https://data.geoscience.fr/ncl/status). Would that solution prevent from accessing our 'local' extensions ?

Depends on how you are using them. Registering external entities and displaying them in a register like that will still work.

However, if there are links from an individual item to those INSIPRE status values then, with the proposed change, those links will go directly to the INSPIRE original URI and not reflect back to the register.

For example, if I click through to Submitted then the Concept link at the bottom of that page currently is one of these ?entity= references in the registry, instead this would go directly to the skos:Concept link.

If that's a problem then I'm not sure we can meet all the competing requirements, at least not without more development effort than is possible in spare time. Any deeper solution to allow some links to reflect back the registry without allowing an xss injection would probably mean checking all outgoing links at the time the item pages are rendered and rewriting to either the internal reference or the external URI at that point. This should be possible given enough time and effort but complex and would have significant performance impact so would need to be feature gated.

mike-gormley commented 1 year ago

I think I've finished my testing around this now so happy with the proposed changes, provided @sgrellet is also happy this does not adversely impact them.

mo-marqh commented 1 year ago

i think i understand the pattern that @sgrellet has implemented

there are RegisterItems where the entity is a 3rd party identifier, e.g. https://data.geoscience.fr/ncl/status/_2453fa17-494d-4a8f-9666-b71332b90fee is defined by the entity http://inspire.ec.europa.eu/registry/status/invalid

i don't think the proposed change alters the behaviour of this publication, the RegisterItems within https://data.geoscience.fr/ncl/status remain accessible in their current form

i think the limitation that @der is describing is that Where another entity defines a status, e.g. https://data.geoscience.fr/ncl/GeolStrcSed/198 adms:status http://inspire.ec.europa.eu/registry/status/invalid the hyperlink on the page is currently https://data.geoscience.fr/ncl/?entity=http://inspire.ec.europa.eu/registry/status/invalid which then routes the browser to the page within that Registry defining the registration

the changes proposed would change this behaviour, such that following the link from https://data.geoscience.fr/ncl/GeolStrcSed/198 status invalid would route the browser directly to https://inspire.ec.europa.eu/registry/status/invalid external to the registry, and bypassing the https://data.geoscience.fr/ncl/status/_2453fa17-494d-4a8f-9666-b71332b90fee page

So, local extensions would be accessible, the access is not prevented. But, what is currently an internal to Registry link for the human browser using the registry would become an external resource link. this is a behaviour change

What level of concern would this cause for your usage @sgrellet ? (in the context that this change proposal provides some key protections for other use cases)

many thanks mark

sgrellet commented 1 year ago

bonjour @mo-marqh, Thanks for investing the issue I raised (in a quite short way) and exploring our vocab.

I had my colleagues writing an explanation but yours was faster and exactly on the spot It's important to be able to extend locally some external vocabs. This aspect of websem practices is really useful to map to all the local semantics they have in their various systems so that we can progressively bring things back on the right tracks

Maybe we did it the wrong way. But would there be a way to test first whether there is a local extension. If no, go directly to the external source. If there is, target the local extension ?

One other example on the same use case : one of your local example was on skos:Concept. We actually discussed with colleagues having the French 'translation' donc on skos Core available as well (see here http://www.sparna.fr/skos/SKOS-traduction-francais.html#overview).

der commented 1 year ago

Hi @sgrellet,

Your use of feature seems entirely reasonable and useful. The problem is only with the XSS vulnerability created by the existing implementation of it.

But would there be a way to test first whether there is a local extension. If no, go directly to the external source. If there is, target the local extension ?

Short answer is: yes but it would require significant development work which would be difficult to accomplish under the current unsupported arrangements, and would in any case take time to develop.

One option would be the approach suggested earlier. At the time the referring page is rendered we do the checks for a local registration of that URI. Then we can either render a direct link to the registered entry or a link to the external entry - either way there be no injectable ?entity. This has the cost that all external but potentially locally-described links within the page would need to be looked up, and there's no direct API query to do that in bulk. This should be possible within the templates using a raw SPARQL query but ensuring we do that in bulk, rather than as each link is rendered, would take significant thought and development time. There would also still be some performance cost, even if done in bulk, so this might need to be feature-gated.

Another option is to render the current ?entity links but change the way they are handled so that if the target URI is local it renders as currently but if it is external it redirects to that external URI without ever rendering that link in a 200 page. The problem with this approach is firstly that it can't be done in the templates, the redirects would have to be done in the endpoint controller - so that would mean development work on registry-core. More seriously, it would might not be acceptable from a security view point. It would mean that https://myregistry/?entity=http://bad/address would return a 30X rather than a 404 - so is potentially still exploitable for SEO by a bad actor.

On balance I think the first approach, conditional link generation, is the way to go.

One route forward would be to merge the current PR, treating it as a stepping stone to the full solution. Then users who want to address the XSS vulnerability in the short term, and can live without the external link redirection for now, could adopt it. Then we can separately discuss scheduling work to develop the conditional link generation, and perhaps address other weaknesses such as the handling of $return on login pages noted by @mike-gormley.

Either way we have applied the corresponding changes to all of our deployed instances already.

mo-marqh commented 1 year ago

One route forward would be to merge the current PR, treating it as a stepping stone to the full solution. Then users who want to address the XSS vulnerability in the short term, and can live without the external link redirection for now, could adopt it.

this would be a really good outcome for us

we will be interested in contributing to

Then we can separately discuss scheduling work to develop the conditional link generation, and perhaps address other weaknesses

der commented 1 year ago

After some failed attempts to structure the changes I propose to go ahead as suggested above, merging the current PR.

I've tagged the current state as registry-config-base-2.3.15 to match the registry-core release.

der commented 1 year ago

Done. Sorry about the rather messy git history but attempts to clean up while preserving just the details we need were proving tricky and run out of time budget.