Weird hybrid pages which show content from mixed locations

miohtama commented 11 years ago

Not sure what to do with this

I want to report an issue with Plone that has been bugging me for a very long time now. There is a lot of issue trackers around though and I don't know which one is the appropriate. I am sure that there is some talk about this already but I cant find it anywhere. It's the problem of when you try to visit a plone site and you type a url that combines more than one valid path for that site. It's hard to explain but its pretty simple with an example: https://plone.org/documentation (GOOD) https://plone.org/support (GOOD) https://plone.org/documentation/support (BAD - weird hybrid page that shows mixed contents from different locations). Expected would be 404 page for https://plone.org/documentation/support DavidJonas: hi

miohtama commented 11 years ago

Not that I know of. It happens on any Plone website (Plone.org is just an example). It might happen when there are any relative links on the site that appear in more than one page. My problem is that some of those pages have been popping out on google searches. I don't think there are links within the site but maybe somebody mistyped/mixed URLs on some other site's link and it ends up on google. robots.txt is the easiest way to eliminate them from google

miohtama commented 11 years ago

Tuning robots.txt need to be assigned someone with Plone god priviledges: I can take if we cannot find anyone else

davidjonas commented 11 years ago

I do still think that robots.txt will only hide a part of the problem since the wrong link would still be on the internet somewhere. The real problem is that Plone allows this type of traversal through the URL. Any possible combination of of 2 or more valid paths in the URL end up on a 200 OK page with unpredictable broken content. On any Plone website out there.

I think the problem is somewhere in either acquisition or traversal that allows this behavior. I think it might be actually a Zope bug instead of a Plone bug. Unfortunately I don't know how to go deeper into this.

It can result in really weird URLs being valid such as:

https://plone.org/news/plone-framework-team-accepts-new-members/news/plone-tune-up-scheduled-for-friday-november-16th

That end up in almost normal looking pages with random slight differences that drive developers insane. For example in the above page. It looks exactly like the valid page https://plone.org/news/plone-tune-up-scheduled-for-friday-november-16th but if you are logged in, you will not see the published state of the page for example. That would be very hard to debug if you didn't notice that the URL was actually wrong.

davisagli commented 11 years ago

Yes, this is because Zope's DefaultPublishTraverse class uses acquisition: it first tries traversing using bobo_traverse, then tries an attribute lookup on the aq_base of the object (i.e. without acquisition), then tries a view lookup, then tries an attribute lookup with acquisition.

We could try experimenting with registering a replacement IBrowserPublisher adapter that doesn't try acquisition, but I suspect that we've got things that depend on it (traversing to items in CMF skin layers, for example, though I haven't confirmed that).

djay commented 11 years ago

On 18/11/2012, at 7:37 AM, David Glick notifications@github.com wrote:

Yes, this is because Zope's DefaultPublishTraverse class uses acquisition: it first tries traversing using bobo_traverse, then tries an attribute lookup on the aq_base of the object (i.e. without acquisition), then tries a view lookup, then tries an attribute lookup with acquisition.

We could try experimenting with registering a replacement IBrowserPublisher adapter that doesn't try acquisition, but I suspect that we've got things that depend on it (traversing to items in CMF skin layers, for example, though I haven't confirmed that).

There are some pretty weird bugs caused by it so it would be worth seeing what does depend on acquisition. For example you get no 404 pages or redirections for anything named the same as something elsewhere in the acquisition path, such as the id of another plone site.

— Reply to this email directly or view it on GitHub.

davisagli commented 11 years ago

I tried it, and as I suspected skin layer items can't be found without getting acquired. We can revisit this once the PLIP to remove skin layers is complete (at which point an option could be added to Zope to turn off acquisition during traversal).

k-j-kleist commented 11 years ago

see http://dev.plone.org/ticket/13354

collective / collective.developermanual

Weird hybrid pages which show content from mixed locations #136