fedeya / remix-sitemap

Sitemap generator for Remix applications
https://npmjs.com/remix-sitemap
MIT License
95 stars 5 forks source link

Proposal: Expose global flags for when running in remix-sitemap cli #64

Closed AjaniBilby closed 4 months ago

AjaniBilby commented 9 months ago

Right now there is no way to know if any given route is being imported because it's being used to generated a sitemap, or because it's actually being used for a server. Because of this it means if a server relies on workers, or opens a connection for logging, it will also do so when attempting to just generate a sitemap then shut down.

This causes the npx remix-sitemap command to hand infinitely, because of connectioned opened or workers started for functionality not used in generating a sitemap.

In this pr I've implemented a basic functioning flag system to allow apps using remix-sitemap to probe if it's been started purely for the purpose of sitemap generation:

import { getFlags } from "remix-sitemap";
const remixSitemapFlags = getFlags();

const bindings = remixSitemapFlags.standalone
    ? {} as any as ReturnType<typeof InitializeWorker>
    : InitializeWorker();

I do not think the current function names or flag names should stay as is, but it's a MVP for something that would be helpful feature to be added

fedeya commented 9 months ago

Sorry, i can't understand this. Can you give an example of a use case for this feature?

AjaniBilby commented 9 months ago

Right now if I use npx remix-sitemap on one of my production servers the command never exits, because the sitemap generation touches a route which generates pdfs, and that route actually imports a file which starts up a worker_thread ready to generate the pdfs without blocking the main thread.

This worker is then sitting idle waiting for jobs it will never recieve because the server never actually started, and the only reason it was loaded into node was because of it's importation in a route which only calls the function on a post request.

Why not just move away from the cli The sitemap generation puts quite a bit of load on the available db connections to the server, and the sitemap happens to be accessed most during periods of high traffic, which are times you don't want the sitemap to happen to be created, so instead I'm using the npx remix-sitemap command to trigger sitemap re-generation only during times of low activity.

Why are you using a worker thread, why not start a seperate node instance The pdf generation is done ia worker_thread to save on resources because a whole new node instance isn't needed, and instead it's just a new V8 context which saves on resources. If we moved pdf generation over to a seperate node instance or server I then add latency of encoding and decoding the pdf over TCP and http transport which wastes CPU cycles, adds latency, and complexity of then ensuring the services are both running and update properly, and communicate correctly.

fedeya commented 9 months ago

Can't this be fixed by just adding some feature to exclude the pdf route at config level to avoid loading that module?

AjaniBilby commented 9 months ago

Some urls which should be in the sitemap which import the pdf generator. Plus I also have another logger.server.ts which intereacts with some web hooks for performance alerts, this file also binds a setInterval which is then also triggered during sitemap generation because some routes can trigger those hooks directly.

I could implement a work around where my server.entry actually sets the global value rather than the library, I made this PR because that wouldn't be a very clean work around. Plus it's an issue potential other users of the library could run into, and the feature itself adds almost no weight to the library itself.