Closed LenVavro closed 1 month ago
The latest updates on your projects. Learn more about Vercel for Git ↗︎
Name | Status | Preview | Comments | Updated (UTC) |
---|---|---|---|---|
web-dev-tools | ✅ Ready (Inspect) | Visit Preview | 💬 Add feedback | Nov 14, 2024 7:09am |
The changes introduce a feature for generating XML sitemaps, including the addition of a new environment variable in the .env.example
file, an API endpoint for sitemap generation, a React component for user interaction, and utility functions for URL validation and limit retrieval. A comprehensive test suite for the sitemap generation function has been established, and a new entry for a sitemap generator tool has been added to the tools JSON file.
File Path | Change Summary |
---|---|
.env.example |
Added environment variable NEXT_PUBLIC_GENERATOR_SITEMAP_XML_LIMIT . |
__tests__/lib/generator/sitemapXml.test.js |
Introduced a test suite for generateSitemapXML with mock fetch , including three main tests. |
src/app/api/generator/sitemap-xml/route.js |
Added API endpoint for generating XML sitemaps with error handling for URL validation. |
src/app/generator/sitemap-xml/page.jsx |
Introduced a React component for generating sitemaps, including state management and error handling. |
src/db/tools.json |
Added new tool entry for "Sitemap XML Generator" with id 28 and link "/generator/sitemap-xml". |
src/lib/generator/sitemapXml.js |
Added functions for generating XML sitemaps, including generateSitemapXML and utility functions. |
src/lib/utils.js |
Added functions isUrlValid(url) and getSitemapXmlGeneratorLimit() for URL validation and limit retrieval. |
sequenceDiagram
participant User
participant ReactComponent
participant API
participant SitemapGenerator
User->>ReactComponent: Input URL
ReactComponent->>API: Fetch sitemap for URL
API->>SitemapGenerator: Generate sitemap
SitemapGenerator-->>API: Return sitemap XML
API-->>ReactComponent: Return sitemap XML
ReactComponent-->>User: Display sitemap
🐰 "In the garden where sitemaps bloom,
A new tool has come to dispel the gloom.
With URLs valid and limits in sight,
We generate sitemaps, oh what a delight!
So hop along, let’s fetch and create,
In the world of XML, we celebrate!" 🌼
[!WARNING] There were issues while running some tools. Please review the errors and either fix the tool’s configuration or disable the tool if it’s a critical failure.
🔧 eslint
> If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration. warning eslint@8.57.1: This version is no longer supported. Please see https://eslint.org/version-support for other options. warning eslint > @humanwhocodes/config-array@0.13.0: Use @eslint/config-array instead warning eslint > @humanwhocodes/config-array > @humanwhocodes/object-schema@2.0.3: Use @eslint/object-schema instead warning eslint > file-entry-cache > flat-cache > rimraf@3.0.2: Rimraf versions prior to v4 are no longer supported warning eslint > file-entry-cache > flat-cache > rimraf > glob@7.2.3: Glob versions prior to v9 are no longer supported warning eslint > file-entry-cache > flat-cache > rimraf > glob > inflight@1.0.6: This module is not supported, and leaks memory. Do not use it. Check out lru-cache if you want a good and tested way to coalesce async requests by a key value, which is much more comprehensive and powerful.
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?
Hi @Bashamega, can you please have a look at my PR. Thanks.
it gives an error
Hello @annuk123 This was a issue with a tool, and io forgot to fix it. It should be solved now
it gives an error
Hello @annuk123 This was a issue with a tool, and io forgot to fix it. It should be solved now
okay
@Bashamega
The current sitemap API isn’t generating a full sitemap for my large site—it only captures part of it.
That's probably because of the limit (100), meaning - 100 pages will be processed (fetched html and parsed links). xml-sitemaps.com/ also has a limit of 500 in free tier. But please provide the page for me to verify.
I’m already on Vercel’s free plan, so I’d prefer a solution that doesn’t require upgrading. Are there any prebuilt sitemap APIs we could use to handle this?
I appreciate your enthusiasm for developing the API, but I’m concerned about the potential ongoing costs.
I understand, however I don't think there is a need to upgrade hosting and worry about cost, you can change the limit (which will affect ram usage and execution time) based on the usage and for those users who need more, this is an open-source project, they have access to the source code and can run it themself, slefhost, copy, modify, etc.
it gives an error
I've just added better error message
Thank you for the prompt reply @LenVavro
This is the website that i have tried it on: https://adambashaahmednaji.com/ I fear that the API will exceed limit, but we can push it to prod and see what will happen/
Thank you for the prompt reply @LenVavro
This is the website that i have tried it on: https://adambashaahmednaji.com/ I fear that the API will exceed limit, but we can push it to prod and see what will happen/
I've checked it @Bashamega and no issue was found, reason for the clipped sitemap is the default limit (100). You can increase it easily using env variable NEXT_PUBLIC_GENERATOR_SITEMAP_XML_LIMIT=
.
From my side, everything's ready for the merge.
Thank you for the prompt reply @LenVavro This is the website that i have tried it on: https://adambashaahmednaji.com/ I fear that the API will exceed limit, but we can push it to prod and see what will happen/
I've checked it @Bashamega and no issue was found, reason for the clipped sitemap is the default limit (100). You can increase it easily using env variable
NEXT_PUBLIC_GENERATOR_SITEMAP_XML_LIMIT=
.From my side, everything's ready for the merge.
Can we use another api? So it can scrape all the website. I don't want the generated sitemaps to be incomplete. I don't have a problem with using a third party free api
@Bashamega you can adjust the limit as you want using NEXT_PUBLIC_GENERATOR_SITEMAP_XML_LIMIT
e.g. 10mil and it will for sure generate full sitemap, but once again even xml-sitemaps.com has a limit of 500 in free tier, as you can read it on their homepage.
I didn't find any free API for this purpose and I don't think someone will provide their server resources for free or without limits. Vercel is already providing free hosting and I've implemented it, so in a way, this is the only available free and unlimited API to generate sitemaps 😄
Description
Issue #108 Sitemap xml generator, that recursively iterate through website's pages, process its html and parse links to create
sitemap.xml
file.Key points
Lightweight
To keep it lightweight I've decided not to use Playwright, Puppeteer or other similar package, but a simple fetch and regex. Furthemore, to minimize ram usage, I am processing HTML response in stream, therefore only chunk of the HTML is stored in the ram at once and immediately processed. The generated sitemap is also streamed to the frontend, so the user can see the progress in real time.
Limitations
Getting HTML content from different page is almost impossible without backend, bc of CORS policies in browsers. Therefore I had to fetch website's content on the server. However, I can see that web is hosted on the Vercel, which has a timeout for server/edge functions. Therefore I set runtime to edge, which should allow streaming response beyond 25s limit (source).
Limit visited pages
Visited page is URL, which content was fetched and processed. To limit number of visited pages I added a limit, which can be set in new env property
NEXT_PUBLIC_GENERATOR_SITEMAP_XML_LIMIT
, if you leave it empty, default limit is 100. Meaning at most 100 pages will be in the final sitemap.xml. This limit is important to save hosting resources.Type of change
Checklist:
Summary by CodeRabbit
Release Notes
New Features
Tests
Chores