DSpace / dspace-angular

DSpace User Interface built on Angular.io
https://wiki.lyrasis.org/display/DSDOC8x/
BSD 3-Clause "New" or "Revised" License
134 stars 433 forks source link

(Discussion) Static Site Generation and DSpace #3183

Open kshepherd opened 3 months ago

kshepherd commented 3 months ago

Description

Generating and serving static HTML and javascript ('Static Site Generation') can help make repository content easier to run in low resource environments, more portable (pages are on disk), and checks off many other best practices.

DSpace Angular can't do everything as static content, of course, but there are many pages which don't change very much (e.g. unauthenticated context + item page) and are the most commonly visited content, by both human and robot users.

This ticket is for discussion about how and where we can use SSG (static site generation) with DSpace repositories, various techniques and how they work, and any experience or progress experimenting with this topic.

Approaches

There seem to be a few approaches with SSG in Angular, so far we have identified:

  1. Build-time generation of configured or discovered routes (user-specified) - this could hammer the REST API quite hard at startup. It could be more useful for repositories who have large numbers of item pages that do not change for years.
    • As noted in July 18 meeting, it can also severely increase build time, if one must prerender with every build this becomes a blocker to fixing bugs, upgrading, etc.
  2. On-demand / 'regeneration' as requests come in?
    • As noted in July 18 meeting, dynamic SSG comes with all the performance problems (and maybe more) as SSR, so could end up negatively impacting on DSpace performance
  3. External tools like scully (see Refs) that can crawl the Angular site and create static content, could be configured and used to generate things incrementally or as admins see fit, on TTLs, etc.

Note re: performance / SSR

This discussion is not intended to be a band-aid "fix" or competing proposal for solving high-resource usage in SSR (server-side rendering). It is a discussion for those curious about supporting static sites in DSpace repositories, as a goal in and of itself.

References

abollini commented 3 months ago

please please please :) only look for option 3. IMHO we need to simplify as much as possible the angular code to reach our primary goal that should be have a fast application in a basic scenario without the need of all these extra layers (cache, pre-site generation) that are of course needed for High Performance, large and heavy accessed site. The other options could be attractive in the short term but would make the performance worst, the installation process more complex and/or slow

kshepherd commented 3 months ago

please please please :) only look for option 3

Good points made today, I've added notes about the downsides to options 1 and 2, and tried to make it clearer that this issue was not supposed to be related to SSR (or even Angular, necessarily) or compete for attention with SSR performance issues.

jameswsullivan commented 2 months ago

I don't know much about the inner workings of DSpace but I really wish that there could be something like the "Simply Static" WordPress plugin for it.

kshepherd commented 2 months ago

@jameswsullivan would a function like "generate website from publicly-accessible resources" fit your use case? (i.e. any items, bitstreams, collections that are not readable by unauthenticated users would be excluded)

jameswsullivan commented 1 month ago

@kshepherd I think so, yes. One of the performance issues we have is that the DSpace site is getting hit by large amounts of either crawling or content scraping bots that enumerate the links/resources, causing heavy load to the hosting. And I think under DSpace's current structure, each of these sessions would trigger calls/responses between the DSpace angular UI and the DSpace API, and the SSR stuff? (I have limited knowledge about how exactly DSpace works but these are the topics I've come across during hosting and orchestration.)

So I'd think if publicly-accessible resources could be generated and then hosted/served in a static way (especially for the bots and anonymous hits), that would alleviate a lot of this problem? But I'm not sure how it would work though, will the publicly-accessible content be served as static pages first, until a user logs in? My analogy to WordPress' Simply Static is probably not a good fit in this scenario because I simply spin up a local WordPress instance and make edits, and then generate the static HTMLs and upload them to the hosting server.