Open mike-gormley opened 1 year ago
Hi Mike,
You are right, there we're knock on changes needed which we failed to spot. The linked PR works locally, can you verify?
In passing I see that the changes for #181 involved a typo but inclined to leave that to a future clean up.
Hi @der, thank you for looking into this. I've deployed the changes to a development environment and it resolves the register table links. However, it reverses the original XSS issue in #181.
We've also learned we can add a URL to the Login modal close button return
querystring parameter and it is even possible to add a URL e.g. /ui/login?return=https://www.google.co.uk
, so not just javascript.
We've also found another location at ./?entity=/Some-text-here
.
Hi @der, thank you for looking into this. I've deployed the changes to a development environment and it resolves the register table links. However, it reverses the original XSS issue in https://github.com/UKGovLD/registry-core/issues/181.
Sorry about that, that was careless. I was sure I had tested that case.
The regression should be reversed by UKGovLD/registry-config-base#37
We've also learned we can add a URL to the Login modal close button return querystring parameter is even possible to add a URL e.g. /ui/login?return=https://www.google.co.uk, so not just javascript.
Switching back to the correct xss clean for the close button case should have addressed this as well.
We've also found another location at ./?entity=/Some-text-here.
That was an interesting case. It's supposed to be able show what it knows about external entities but the underlying check was returning empty information instead of an explicit 404 so the render chosen was incorrect. That should be fixed too.
Please let me know if these changes work for you.
I think we're nearly there. I've deployed another instance today with the changes in https://github.com/UKGovLD/registry-config-base/pull/37 and been testing what I can. On the whole, looks like it's handling many of the scenarios previously mentioned as well as it can though I think the entity
param issue resolution is at odds with the intended purpose as you refer to.
?entity=http://www.mybadwebsite.com
now shows a 'Not found (404)', which is want we want.
Conversely, the intended use of this query parameter was to also cater for external definition entities such as /?entity=http://www.w3.org/2004/02/skos/core%23Concept
now doesn't work as intended. Use of this in the templates (e.g. The type
in the metadata pane of _item-render.vm
might get around this by directly linking to those known external URIs (as it does elsewhere) but you might know of other ways to address this.
Unfortunately, I have found that the 404 page also suffers from the same problem as what we are trying to resolve. It will display the invalid URL in the page as a clickable link, which can actually be valid in this format!
We could fix this in the _not-found.vm
template ourselves by removing the parroting of the invalid URL as an interactive link (or preferably not mentioned at all) but would be helpful if we didn't need to maintain a divergence in functionality in our own instances.
The issue is compounded by a separate concern in that 404 pages actually return HTTP 200 OK responses so has potential to be treated as valid pages of our websites and indexed by web bots (that link to web pages outside our control).
The issue is compounded by a separate concern in that 404 pages actually return HTTP 200 OK response
this seems quite crucial to me
i'd expect to receive an HTTP 404 from such a page, as well as a rendered not found
page
where is the HTTP response code managed in these cases?
would it be possible to include the correct HTTP response code to ensure that the ?entity
pattern can be coded against, using HTTP codes to differentiate exists
from does-not-exist
many thanks, mark
The ?entity
queries at the UI level were supposed to be answering "what do you know about X" and a 200 response saying "I don't know anything about X" seems entirely the correct response. This is quite different from retrieving a URI within the registry namespace and getting a 404 if it doesn't exist.
Conversely, the intended use of this query parameter was to also cater for external definition entities such as /?entity=http://www.w3.org/2004/02/skos/core%23Concept now doesn't work as intended.
Well, actually your registry probably doesn't have any registered item for http://www.w3.org/2004/02/skos/core#Concept so it is now returning the "correct" response, it's just not a helpful one!
I think our issue is that the templates which render external entities are using the ?entity
queries just in case there is additional information on them within the registry. I'm not sure there's a way we can do that without having this exposure. So the answer may be to stop generating any of these ?entity
queries in the first place and simply link directly to the external URL. While leaving the entity queries in place to allowing anyone to check for registered external entities via this route if required.
Not sure how big a job that is but can try to take a look this evening.
Unfortunately, I have found that the 404 page also suffers from the same problem as what we are trying to resolve. It will display the invalid URL in the page as a clickable link, which can actually be valid in this format!
Yes, I wondered about that. While it's sanitised in the sense that it should be blocking e.g. javascript it does render a link. If that's not desirable, and I can see the argument against it, then can remove the href from the 404 template.
I've updated the PR with a proposed and fairly simple resolution.
This removes the use of ?entity
calls internally. Now any external URI reference from a registered entry links directly to that external entity and there is no reliance on the ?entity
UI functionality for these.
Secondly it removes the processing of the ?entity
query parameter entirely from the UI. So a query to /entity=http://www.mybadwebsite.com
is treated as simply a query to /
. This is similar to how sites like https://www.google.com
handle spurious query parameters.
Apart from the now-removed internal use, the only purpose of the ?entity
query could be to search for external registered entries. At the UI level that is more conveniently done by using the check
operation under the advanced menu which already handles non-registered entities without rendering a link to them.
This brute force removal of the functionality seems safer than relying on correct detection of unregistered entities, however we report them.
Please let me know if this is an acceptable solution.
I've also taken a pass through the code to identify other potential issues in the admin pages and added additional guards for those cases.
Many of the additional issues discovered occur in admin pages. I've also disabled all admin page templates when not authorized and taken a further pass over the partials that are used in such pages since it seems that if an attacker has access to the code they could try to invoke those directly.
This pass through the code is complete enough for review. I'm hopeful that we've now mitigated these vulnerabilities but would be grateful for any further checking you can do.
@der You've been busy the last 24 hours! I really appreciate the broader approach to locating other potential loopholes in the templates.
Just to place an update here from my perspective at the end of progress today. I'd deployed the main chunk of last night's/this morning's commits to a local dev environment earlier today and you've seen I've added one or two comments on the more recent proposed changes already. The complete removal of the ?entity=
is a welcome proposal, as are the additional validations in place, and the removal of the link on the Not Found page. Previous vulnerabilities appear to be addressed.
I'm still regression testing as much of the registry capabilities as I can find in slower time. The only remaining issue I've seen is the Close button on the login modal and the create-redirect modal forms both have stopped working as the URIs are encoded. I'm less fussed about this in the grand scheme of things and could be split out into a separate bugfix. The redirects for actually logging in/registering still function as intended and still cater for misbehaviour so not all bad here.
I'll do another deploy of all commits tomorrow to test the newest commits, and hopefully we can promote these changes. Thanks again.
"Please let me know if this is an acceptable solution."
Is the UseCase where you locally extend an external entity handled in that proposal ?
For example, here, we extended INSPIRE 'status' register (https://data.geoscience.fr/ncl/status). Would that solution prevent from accessing our 'local' extensions ?
I'm still regression testing as much of the registry capabilities as I can find in slower time. The only remaining issue I've seen is the Close button on the login modal and the create-redirect modal forms both have stopped working as the URIs are encoded. I'm less fussed about this in the grand scheme of things and could be split out into a separate bugfix. The redirects for actually logging in/registering still function as intended and still cater for misbehaviour so not all bad here.
Thanks, I was afraid that might be a risk though I thought I'd tested that. As you say, something we can live with for now. We might have to a different style of cleaning for these return links (e.g. a regex to limit to a safe relative URI).
HI @sgrellet
Glad you noticed this thread, I was about to contact you to make sure you were aware.
Is the UseCase where you locally extend an external entity handled in that proposal ?
For example, here, we extended INSPIRE 'status' register (https://data.geoscience.fr/ncl/status). Would that solution prevent from accessing our 'local' extensions ?
Depends on how you are using them. Registering external entities and displaying them in a register like that will still work.
However, if there are links from an individual item to those INSIPRE status values then, with the proposed change, those links will go directly to the INSPIRE original URI and not reflect back to the register.
For example, if I click through to Submitted then the Concept
link at the bottom of that page currently is one of these ?entity=
references in the registry, instead this would go directly to the skos:Concept
link.
If that's a problem then I'm not sure we can meet all the competing requirements, at least not without more development effort than is possible in spare time. Any deeper solution to allow some links to reflect back the registry without allowing an xss injection would probably mean checking all outgoing links at the time the item pages are rendered and rewriting to either the internal reference or the external URI at that point. This should be possible given enough time and effort but complex and would have significant performance impact so would need to be feature gated.
I think I've finished my testing around this now so happy with the proposed changes, provided @sgrellet is also happy this does not adversely impact them.
i think i understand the pattern that @sgrellet has implemented
there are RegisterItems where the entity is a 3rd party identifier, e.g. https://data.geoscience.fr/ncl/status/_2453fa17-494d-4a8f-9666-b71332b90fee is defined by the entity http://inspire.ec.europa.eu/registry/status/invalid
i don't think the proposed change alters the behaviour of this publication, the RegisterItems within https://data.geoscience.fr/ncl/status remain accessible in their current form
i think the limitation that @der is describing is that Where another entity defines a status, e.g. https://data.geoscience.fr/ncl/GeolStrcSed/198 adms:status http://inspire.ec.europa.eu/registry/status/invalid the hyperlink on the page is currently https://data.geoscience.fr/ncl/?entity=http://inspire.ec.europa.eu/registry/status/invalid which then routes the browser to the page within that Registry defining the registration
the changes proposed would change this behaviour, such that following the link from https://data.geoscience.fr/ncl/GeolStrcSed/198 status invalid would route the browser directly to https://inspire.ec.europa.eu/registry/status/invalid external to the registry, and bypassing the https://data.geoscience.fr/ncl/status/_2453fa17-494d-4a8f-9666-b71332b90fee page
So, local extensions would be accessible, the access is not prevented. But, what is currently an internal to Registry link for the human browser using the registry would become an external resource link. this is a behaviour change
What level of concern would this cause for your usage @sgrellet ? (in the context that this change proposal provides some key protections for other use cases)
many thanks mark
bonjour @mo-marqh, Thanks for investing the issue I raised (in a quite short way) and exploring our vocab.
I had my colleagues writing an explanation but yours was faster and exactly on the spot It's important to be able to extend locally some external vocabs. This aspect of websem practices is really useful to map to all the local semantics they have in their various systems so that we can progressively bring things back on the right tracks
Maybe we did it the wrong way. But would there be a way to test first whether there is a local extension. If no, go directly to the external source. If there is, target the local extension ?
One other example on the same use case : one of your local example was on skos:Concept. We actually discussed with colleagues having the French 'translation' donc on skos Core available as well (see here http://www.sparna.fr/skos/SKOS-traduction-francais.html#overview).
Hi @sgrellet,
Your use of feature seems entirely reasonable and useful. The problem is only with the XSS vulnerability created by the existing implementation of it.
But would there be a way to test first whether there is a local extension. If no, go directly to the external source. If there is, target the local extension ?
Short answer is: yes but it would require significant development work which would be difficult to accomplish under the current unsupported arrangements, and would in any case take time to develop.
One option would be the approach suggested earlier. At the time the referring page is rendered we do the checks for a local registration of that URI. Then we can either render a direct link to the registered entry or a link to the external entry - either way there be no injectable ?entity
. This has the cost that all external but potentially locally-described links within the page would need to be looked up, and there's no direct API query to do that in bulk. This should be possible within the templates using a raw SPARQL query but ensuring we do that in bulk, rather than as each link is rendered, would take significant thought and development time. There would also still be some performance cost, even if done in bulk, so this might need to be feature-gated.
Another option is to render the current ?entity
links but change the way they are handled so that if the target URI is local it renders as currently but if it is external it redirects to that external URI without ever rendering that link in a 200 page. The problem with this approach is firstly that it can't be done in the templates, the redirects would have to be done in the endpoint controller - so that would mean development work on registry-core. More seriously, it would might not be acceptable from a security view point. It would mean that https://myregistry/?entity=http://bad/address
would return a 30X rather than a 404 - so is potentially still exploitable for SEO by a bad actor.
On balance I think the first approach, conditional link generation, is the way to go.
One route forward would be to merge the current PR, treating it as a stepping stone to the full solution. Then users who want to address the XSS vulnerability in the short term, and can live without the external link redirection for now, could adopt it. Then we can separately discuss scheduling work to develop the conditional link generation, and perhaps address other weaknesses such as the handling of $return
on login pages noted by @mike-gormley.
Either way we have applied the corresponding changes to all of our deployed instances already.
One route forward would be to merge the current PR, treating it as a stepping stone to the full solution. Then users who want to address the XSS vulnerability in the short term, and can live without the external link redirection for now, could adopt it.
this would be a really good outcome for us
we will be interested in contributing to
Then we can separately discuss scheduling work to develop the conditional link generation, and perhaps address other weaknesses
After some failed attempts to structure the changes I propose to go ahead as suggested above, merging the current PR.
I've tagged the current state as registry-config-base-2.3.15
to match the registry-core release.
Done. Sorry about the rather messy git history but attempts to clean up while preserving just the details we need were proving tricky and run out of time budget.
Hello,
It's taken me a few weeks to come back to this but following on from https://github.com/UKGovLD/registry-core/issues/181 we've encountered a further issue. Seems I only did affirmative testing against the previous issues raised rather than further regression testing.
We've found that the link in the name column of the registers is 404'ing due to the URL encoding of the register/entity path. For example,
https://mywebsite.com/_my-register
is becominghttps://mywebsite.com/%2F_my-register
, orhttps://mywebsite.com/my-register/_sub-register
is becominghttps://mywebsite.com/%2Fmy-register%2Fsub-register
I'm uncertain if this is something caused by our template customisations or whether this occurs in the standard templates. Could anyone confirm that version 2.3.14 is working (or not working) as expected?