Islandora / documentation

Contains islandora's documentation and main issue queue.
MIT License
104 stars 72 forks source link

Islandora link headers grow unbounded #1764

Open bseeger opened 3 years ago

bseeger commented 3 years ago

Every time we add an entity reference to a node, we get another link header (Subjects, Resource Type, Genre, Contributor -- anything referencing a taxonomy term). This overall makes sense, but means the link headers can grow unbounded and potentially run into external limits. Once we had our metadata application profile in place and started testing nodes, we ran into header buffer size limits in NGINX and started getting HTTP 502 Bad Gateway errors. The limits are easily upped, but the headers can still grow.

Link: <http://purl.org/coar/resource_type/c_c513>; rel="tag"; title="Image"
Link: <http://future.islandora.ca/taxonomy/term/5>; rel="tag"; title="Image"
Link: <http://future.islandora.ca/taxonomy/term/27>; rel="tag"; title="Cats"
Link: <http://future.islandora.ca/taxonomy/term/28>; rel="tag"; title="Dogs"
Link: <http://future.islandora.ca/taxonomy/term/28>; rel="tag"; title="Dogs"
Link: <http://future.islandora.ca/taxonomy/term/27>; rel="tag"; title="Cats"

Islandora itself doesn't concern itself with how it's deployed, really, so this ticket is about should we have all these link headers? Do they add value? Can we de-dup them? How much control over them do we have in the first place?

The system functions overall, but nodes with enough entity reference that hit this limit return 502's. It is a nasty little bug to run into when you suddenly start getting 502's for one node and not another. I'm not sure if there is an easy fix here, so this is more of a conversation starter and to make folks aware about these headers.

dannylamb commented 3 years ago

Hey @bseeger,

I think @kayakr has stumbled into this before as well.

So we do that to try and be as "RESTful" as possible while still conforming to web standards, but I dunno how many people (er... client softwares) are making use of it. We use some link headers in the backend to get things into Fedora, but certainly not all. The ones generated by your standard entity references (member or, media of, tags, etc...) don't come into play at all as far as Islandora is concerned.

I'm happy to either

  1. Eliminate them entirely (you know how much I love :fire:)
  2. Keep them, but push them into the message body using json:api and https://www.drupal.org/project/jsonapi_hypermedia

I'm open to other suggestions, too. In the very least, or maybe just as a stop-gap measure, we can document the issue and suggest workarounds for nginx/apache.

birkland commented 3 years ago

Right now, it's unclear to me:

Right now, it seems the information these link headers convey is limited. i.e. they're all tag relation (unrelated to, say, what the RDF predicate would be when relating the object to the entity). The nature of the linked resource is not apparent to the consumer (i.e. they could be any other entity or taxonomy term. Name, subject, copyright, type, etc). title can be confusing, particularly (as is the case of Image in Bethany's example) when the same title is used for different resources.

It would seem ideal if they were configurable somehow. Maybe we want to simply un-check a checkbox to turn them off entirely. Maybe others might want to specify which taxonomies they wanted to include, or use a different rel (related, maybe), I'm not sure.

dannylamb commented 3 years ago

@bseeger I'm curious as to how they're getting duplicated. Are you tagging twice or do we just have a bug there.

@birkland It's provided by the main islandora module as an attempt to provide links to relevant items in message headers. The idea is that you'd be able to navigate the repository using just HEAD requests until you find what you want. I don't know who's actually using it, though. And our own backend, which I'd consider the main client / user of this feature, doesn't really use it much at all.

I think in terms of concrete steps forward, we can definitely

  1. Investigate the duplicates and de-dupe them. It's pretty wasteful / silly to keep them.
  2. Figure out how to limit / restrict them
    1. With little effort, we can only emit the headers to REST api requests and not when a user views the page in the browser
    2. With a bit more effort, we can toggle the feature with config

But if no one really is using this at all and it's more of a nuisance than anything else, we can totally deprecate and remove it. If we make it toggleable, and no one uses that toggle and everyone just sets it to off and walks away.... then we don't actually need to maintain that code at all.

kayakr commented 3 years ago

I can see why link headers are potentially useful, but it was a tricky issue to diagnose when we encountered it for the first time, and nginx has quite a low allowance by default (64 I think). See previously #1519 Islandora generates Link headers for non-repository content

bseeger commented 3 years ago

@dannylamb - I totally double tagged things (on purpose just to have a number of links in there). Here's the record: http://future.islandora.ca/node/40

Screen Shot 2021-02-17 at 2 21 03 PM

mjordan commented 3 years ago

@dannylamb points out that

The idea is that you'd be able to navigate the repository using just HEAD requests until you find what you want.

If that's the case, couldn't a REST client use the JSON-LD for a node to do the same thing, using GET requests? If so, would we need all those link headers at all?

dannylamb commented 3 years ago

@bseeger Good to know it's not a bug, just a use case that was never considered. Didn't plan on folks tagging twice.

@mjordan Good point. Everything we're exposing is in the jsonld already. The advantage would be that you don't have to pull down the whole record and can get by with just HEAD requests, which would be faster. But considering this has inconvenienced more people than those who have taken advantage of the feature, frankly I don't think it's even worth it at this point.

mjordan commented 3 years ago

I completely agree. I'd gladly sacrifice the link headers for more reliability, especially when we have similar functionality that doesn't have nasty side effects. Would be happy to hear alternative points of view though.

antbrown commented 2 years ago

I have also recently come up against this nginx header limit being overcome by Link headers added by an entity_reference field.

I think in the short term I'm going to ask for the http_max_hdr limit to be increased on the server. Long term I suggest removing Link headers for non-Islandora objects or allowing site administrator to configure which entity types/fields are used to generate Link headers.

mjordan commented 2 years ago

Workaround implemented in Islandora Workbench is to set Requests' max headers to 10,000 (from default of 100 headers).