Closed ispringle closed 1 month ago
Interesting question!
However, that will only help with the link gathering step. To inject that data in all pages, you will probably still need to save that data to an external file and use that file in the next soupault run. However, there are ways to reduce the build time impact of that.
Regarding simple sharing data from different pages in the same plugin code, there's this: https://soupault.app/reference-manual/#plugin-persistent-data
However, that per-plugin persistent data may not be enough for your use case, since the plugin has no way to know if the page that it's currently running on is that last page or not, so it can't decide when it's time to dump the data to disk.
I was thinking of adding a table for global persistent data. I wasn't sure if it was good idea, but your use case seems like a good reason to add it. (One reason I'm hesitant is that it will need a global lock when I add multi-core support, but the multi-core support design is an interesting question in general) That data accumulated from plugin runs can then be saved to disk by a post-build hook.
I think it should be easy enough to add. Do you have a setup for building soupault from source to test that, or will you need binaries? If you need binaries, which OS do you use?
Also, if you are willing to exclude links that only appear on index pages, you can use https://soupault.app/reference-manual/#making-index-data-available-to-every-page and avoid having to run soupault twice. However, this still needs a global storage mechanism and/or a way for Lua code to check if it's the first (index extraction only) pass of the second, real build pass.
I'm going to add both for testing.
Thanks for the quick response! I'm running Soupault on MacOS.
@ispringle Ok, so I've added two things:
There's a new variable named soupault_pass
in the plugin environment. It can have the following values:
index.index_first
is false.index.index_first
is true and it's the first (index extraction) pass.index.index_first
is true and it's the second (full rendering) pass.if soupault_pass < 2 then
-- Things that should only be done to collect the data
else
-- Things that should be done to render the collected data
end
A new variable named global_data
is accessible to every plugin, and can be used to exchange data between different plugins. It's not available to hooks yet.
So, if you want to use the same plugin for collecting and rendering back links, you can use the old plugin-local peristent_data
. If you want to use different plugins, there's now global_data
for that.
You can enable index.index_first = true
and add two paths to the plugins: backlink collection on soupault_pass < 2
and backlink rendering on soupault_pass >= 2
.
Here's a macOS build with those two commits: soupault-3e95fe5.zip
Let me know how it goes.
@ispringle Have you had a chance to try it out?
Sorry, your comment has been sitting in my inbox waiting for me to respond to. I have not had a chance to take a look at soupault again since about the time I made this issue. Been in the last weeks of a project at work. I should have time either at the end of this week or at least by next Monday to get back to personal projects.
I was able to get a minimal proof of concept going. I'll share it once I've gotten it setup to actually insert backlinks. Thanks for your efforts!
I've been using this feature for similar purposes for a while and it works, so I suppose the issue is resolved. Feel free to create more specific issues if needed.
Trying to figure out how I can take all generated pages, search each "referrer page" for any internal link (
<a>
) and then include a reference to that "referrer page" on the pages which it links to. IEfoo.html:
Would then generate on
bar.html
:I'm struggling to get this working because plugins/widgets operate on a single file at a time, but I need to aggregate all internal links from all pages, then take that data and iterate through all pages to inject the backlinks, and I can't figure out how to make my data persist. The Lua data structure I imagine would work is some table such as:
I tried writing this data to a temp file but it seems dirty and wrong and really slows things down since that's 3 IO ops per file (The initial read and then write to aggregate links and then the final read to inject links).