fedeya / remix-sitemap

Sitemap generator for Remix applications
https://npmjs.com/remix-sitemap
MIT License
95 stars 5 forks source link

XML Special characters aren't being escaped properly #57

Closed AjaniBilby closed 10 months ago

AjaniBilby commented 11 months ago

Describe the bug The library is not correctly sanitising inputs before putting them in an XML body

To Reproduce

import type { SitemapFunction } from 'remix-sitemap';
import { json } from "@remix-run/node";

export const sitemap: SitemapFunction = async () => {
    return [{
        lastmod: new Date().toISOString().split('T')[0],
        loc: `/news/0`,
        news: [{
            title: "Something with an & symbol",
            publication: {
                name: 'Website',
                language: 'en'
            },
            date: new Date().toISOString().split('T')[0]
        }]
    }]
};

export async function loader() { return json({}); }

Invalid XML output

<url>
    <loc>https://example.com/news/0</loc>
    <lastmod>2023-10-15</lastmod>
    <changefreq>daily</changefreq>
    <priority>0.7</priority>
    <news:news>
        <news:publication>
            <news:name>Signplanet</news:name>
            <news:language>en</news:language>
        </news:publication>
        <news:publication_date>2023-10-15</news:publication_date>
        <news:title>Something with an & symbol</news:title>
    </news:news>
</url>

Expected behavior

<url>
    <loc>https://example.com/news/0</loc>
    <lastmod>2023-10-15</lastmod>
    <changefreq>daily</changefreq>
    <priority>0.7</priority>
    <news:news>
        <news:publication>
            <news:name>Signplanet</news:name>
            <news:language>en</news:language>
        </news:publication>
        <news:publication_date>2023-10-15</news:publication_date>
        <news:title>Something with an &amp; symbol</news:title>
    </news:news>
</url>

Environment (please complete the following information):

AjaniBilby commented 11 months ago

This could be a potetial patch fix, though it's not perfect:

"Some text & stuff <things> here".replaceAll(/[^A-z0-9 ]/g, (c)=>`&#${c.charCodeAt(0).toString()};`)
'Some text &#38; stuff &#60;things&#62; here'
fedeya commented 11 months ago

I guess this can be resolved by simply changing this to true, but performance may be worse

https://github.com/fedeya/remix-sitemap/blob/7e529e3640ac696101bc74af5af60933aa4332b0/src/builders/sitemap.ts#L115

AjaniBilby commented 11 months ago

It would indeed slow down performance, but I doubt that it will impact the entire tool meaningfully for it's purpose based on the request per seconds claim.

At least for me the biggest issue is that this library for some reason creates a new database connection for every single route it's generating a sitemap for, which is a much larger bottle neck than any xml processing would be

AjaniBilby commented 11 months ago

Ironically it might be faster to use the regex method I provided earlier, because then you can apply it only the fields which might have special characters such as names and titles, and can ignore dates and urls

fedeya commented 11 months ago

Yes, but your other comment makes sense, it's not expected to have a bunch of requests in a sitemap. The processEntities can be true

At least for me the biggest issue is that this library for some reason creates a new database connection for every single route it's generating a sitemap for, which is a much larger bottle neck than any xml processing would be

For this, if you are creating the connection to db in each sitemap function, i assume this is normal. In a node environment, this should be resolved by connecting outside of the function, sharing the instance. i guess. In serverless/edge idk maybe solving #6 can help some. I need to test how it works.

Please create another issue to investigate more about this