PataphysicalSociety / soupault

Static website generator based on HTML element tree rewriting
https://soupault.app
MIT License
374 stars 18 forks source link

Aggregating "backlinks" #53

Closed ispringle closed 1 month ago

ispringle commented 1 year ago

Trying to figure out how I can take all generated pages, search each "referrer page" for any internal link (<a>) and then include a reference to that "referrer page" on the pages which it links to. IE

foo.html:

...
<title>An Example of "Backlinking"</title>
...
<h2 id="heading-1">Heading 1</h2>
See <a href="/bar#baz" id="heading-1-citation-bar-baz">my post on bar</a> for more details

Would then generate on bar.html:

<ul><li><a href="/foo#heading-1-citation-bar-baz">An Example of "Backlinking"</a></li></ul>

I'm struggling to get this working because plugins/widgets operate on a single file at a time, but I need to aggregate all internal links from all pages, then take that data and iterate through all pages to inject the backlinks, and I can't figure out how to make my data persist. The Lua data structure I imagine would work is some table such as:

link_lookup = { bar = { { title = "An Example of \"Backlinking\"", href = "/foo#heading-1-citation-bar-baz" }, ... }, ... }

I tried writing this data to a temp file but it seems dirty and wrong and really slows things down since that's 3 IO ops per file (The initial read and then write to aggregate links and then the final read to inject links).

dmbaturin commented 1 year ago

Interesting question!

However, that will only help with the link gathering step. To inject that data in all pages, you will probably still need to save that data to an external file and use that file in the next soupault run. However, there are ways to reduce the build time impact of that.

Regarding simple sharing data from different pages in the same plugin code, there's this: https://soupault.app/reference-manual/#plugin-persistent-data

However, that per-plugin persistent data may not be enough for your use case, since the plugin has no way to know if the page that it's currently running on is that last page or not, so it can't decide when it's time to dump the data to disk.

I was thinking of adding a table for global persistent data. I wasn't sure if it was good idea, but your use case seems like a good reason to add it. (One reason I'm hesitant is that it will need a global lock when I add multi-core support, but the multi-core support design is an interesting question in general) That data accumulated from plugin runs can then be saved to disk by a post-build hook.

I think it should be easy enough to add. Do you have a setup for building soupault from source to test that, or will you need binaries? If you need binaries, which OS do you use?

dmbaturin commented 1 year ago

Also, if you are willing to exclude links that only appear on index pages, you can use https://soupault.app/reference-manual/#making-index-data-available-to-every-page and avoid having to run soupault twice. However, this still needs a global storage mechanism and/or a way for Lua code to check if it's the first (index extraction only) pass of the second, real build pass.

I'm going to add both for testing.

ispringle commented 1 year ago

Thanks for the quick response! I'm running Soupault on MacOS.

dmbaturin commented 1 year ago

@ispringle Ok, so I've added two things:

Soupault pass flag

There's a new variable named soupault_pass in the plugin environment. It can have the following values:

if soupault_pass < 2 then
  -- Things that should only be done to collect the data
else
  -- Things that should be done to render the collected data 
end

Global data

A new variable named global_data is accessible to every plugin, and can be used to exchange data between different plugins. It's not available to hooks yet.

So, if you want to use the same plugin for collecting and rendering back links, you can use the old plugin-local peristent_data. If you want to use different plugins, there's now global_data for that.

You can enable index.index_first = true and add two paths to the plugins: backlink collection on soupault_pass < 2 and backlink rendering on soupault_pass >= 2.

Here's a macOS build with those two commits: soupault-3e95fe5.zip

Let me know how it goes.

dmbaturin commented 1 year ago

@ispringle Have you had a chance to try it out?

ispringle commented 1 year ago

Sorry, your comment has been sitting in my inbox waiting for me to respond to. I have not had a chance to take a look at soupault again since about the time I made this issue. Been in the last weeks of a project at work. I should have time either at the end of this week or at least by next Monday to get back to personal projects.

ispringle commented 1 year ago

I was able to get a minimal proof of concept going. I'll share it once I've gotten it setup to actually insert backlinks. Thanks for your efforts!

dmbaturin commented 1 month ago

I've been using this feature for similar purposes for a while and it works, so I suppose the issue is resolved. Feel free to create more specific issues if needed.