docsifyjs / docsify

🃏 A magical documentation site generator.
https://docsify.js.org
MIT License
27.52k stars 5.67k forks source link

generate sitemap #656

Open SidVal opened 5 years ago

SidVal commented 5 years ago

Hi.

Is it possible to create a sitemap for the docsify site?

QingWei-Li commented 5 years ago

Impossible. You can create it manually, but I am not sure if the hash router is valid for the search engines.

SidVal commented 5 years ago

This is interesting

JavaScript Crawling and Indexing – Final Results Let’s start with basic configurations for all the frameworks used for this experiment.

Results Source: Can Google Properly Crawl and Index JavaScript Frameworks? A JavaScript SEO Experiment

Repo's source: https://github.com/kamilgrymuza/jsseo


Crawling

Crawling Source: JavaScript vs. Crawl Budget: Ready Player One


Final thoughts

Then it is useless to generate a sitemap. In SEO terms, our website would not have a good impact for search engines. :(

trusktr commented 4 years ago

I want to re-open this I think it'd be valuable to generate a site map, regardless of hash mode. Some people will use the non-hash mode in which case it is useful. Also we have SSR (being fixed in a current PR) and upcoming plans for static site generation, both of which would benefit from a sitemap.

According to this article from 2014, Google can index hash-based routes as separate pages if using a "hash bang" syntax: instead of making your pages have the form example.com/#/some/page they should be of the form example.com/#!/some/page and then Google will consider the hash as part of the URL. Hash-bang is not required anymore since 2018 according to Google.

What's the latest on hash-based routing and SEO?

cc @jhildenbiddle @anikethsaha

EDIT:

According to official words from Google (see links in that article), people are straight up confused (look at the comments). It isn't clear if hash routing works with Google SEO. If you follow and read all the related tweets, you will be confused. In particular, see these two seemingly contradictory tweets:

EDIT: According to https://searchengineland.com/google-can-crawl-ajax-just-fine-322254, hashes should be SEO friendly now, and the Google crawler understands hash-based routing (follow hash changes) and indexes content on dynamic page changes (hash changes).

trusktr commented 4 years ago

Based on that last article, I think we should just make sitemaps regardless. If it works with hashes, it works. If it doesn't, it doesn't. But at least for the other cases we'll be covered (especially SSR and static sites).

For static generation, we will need to programmatically assimilate a list of pages (f.e. based on _sidebar.md, _navbar.md, links in pages, etc). This information allows us to know which static pages we need to output. We can also use this information for sitemap output. Static site generation, sitemap generation, or both, would re-use the same code mechanism.

trusktr commented 4 years ago

Ah! This is interesting. I tried to run the Docsify site through Google Search Console's Rich-results test and mobile-friendly test. Here are the results:

As you can see in either test, it has issues reading URLs in anchor tags, for example. It has no idea that we will convert them into hash URLs. I think for v5 we should re-consider how we output the anchor tags, so that Google can understand them.

These two tests are basically a window into how the Google Crawler sees and understands web sites (and has no issues loading a page from a hash route).

trusktr commented 4 years ago

By the way, I found these tools while watching the http://web.dev/live conference Day 1 video that was released a few days ago: https://youtu.be/H89hKw06iWs?t=9201 (at 2 hours 33 minutes it goes into the Google Search stuff). The video shows you how to debug SEO problems with it on SPAs and similar. Neat!!

After that the same guy talks about Structured Data, and the main cool feature is that we can place the structures data on the page dynamically any time we change pages, and Google bot reads the information any time we generate it so that it know when/what to index on an SPA. That's a bit off topic from sitemaps though.

I think the bottom line is we can make a sitemap for hash-based SPAs (like Docsify's default mode). It'll be useful regardless, for other modes.

trusktr commented 4 years ago

@waruqi I thought you commented about your xmake sitemap generator (I saw the email). That's neat!

waruqi commented 4 years ago

@waruqi I thought you commented about your xmake sitemap generator (I saw the email). That's neat!

The result I generated was wrong, so I deleted this comment. Now I need generate some static html files and add their urls in sitemap.xml. see https://github.com/xmake-io/xmake-docs/blob/master/sitemap.xml

trusktr commented 4 years ago

Ah ok. Well if you happen to get the output right, it could be a good solution until we have the one from static site generation.

waruqi commented 4 years ago

Ah ok. Well if you happen to get the output right, it could be a good solution until we have the one from static site generation.

Yes , you can search site:xmake.io in google engine to see the current results. It works now.

trusktr commented 4 years ago

Neat! Interested in making a pull request to add this in a non-breaking way? I think it can serve well for the meantime. It may be a little while before we get to static site generation (and thus site maps).

@jhildenbiddle @anikethsaha thoughts?

anikethsaha commented 4 years ago

is there any library to do so ?

waruqi commented 4 years ago

is there any library to do so ?

You can use markdown-to-html or showdown to generate static html file from markdown.

And use github-markdown-css to add markdown page style.

I written a lua script to generate my docsify html pages. https://github.com/xmake-io/xmake-docs/blob/master/build.lua

$ cd xmake-docs
$ xmake l build.lua

And the generated page results: https://xmake.io/mirror/package/remote_package.html

jhildenbiddle commented 4 years ago

There's a lot of overlap here with #1235. May be worth consolidating.

Also, if I'm reading correctly above it seems like we could change our internal URL system from rendering links like this:

<a href="#/?id=features">...</a>

To this:

<a href="https://docsify.js.org/#/?id=features">...</a>

And Google may "just work", no? We'd have to capture when these links are clicked and navigating via JS, but we're doing that anyway. If it did, this would allow us to auto-generated sitemaps using online tools or our own build-time crawler.

waruqi commented 4 years ago

I have fixed all links in my generated mirror html pages. see https://xmake.io/mirror/manual/project_target.html

And it works. I can jump to all links normally in the static page I generated.

<a href="/manual/builtin_modules?id=osmv">os.mv</a>

to

<a href="/mirror/manual/builtin_modules.html#osmv">os.mv</a>
-- fix links
function _fixlinks(htmldata)

    -- <a href="/manual/builtin_modules?id=osmv">os.mv</a>
    -- => <a href="/mirror/manual/builtin_modules.html#osmv">os.mv</a>
    htmldata = htmldata:gsub("(href=\"(.-)\")", function(_, href)
        if href:startswith("/") and not href:startswith("/#/") then
            local splitinfo = href:split('?', {plain = true})
            local url = splitinfo[1]
            href = "/mirror" .. url .. ".html"
            if splitinfo[2] then
                local anchor = splitinfo[2]:gsub("id=", "")
                href = href .. "#" .. anchor
            end
            print(" -> fix %s", href)
        end
        return "href=\"" .. href .. "\""
    end)

    -- <h4 id="os-rm">os.rm</h4>
    -- => <h4 id="osrm">os.rm</h4>
    htmldata = htmldata:gsub("(id=\"(.-)\")", function(_, id)
        id = id:gsub("%-", "")
        return "id=\"" .. id .. "\""
    end)
    return htmldata
end
TomMeulendijks commented 4 years ago

I created this function to create a sitemap. Works for me. It will write a file called sitemap.xml in the docs folder. Hope that helps some of you.

const fs = require('fs');
const path = require('path');
const xmlbuilder = require('xmlbuilder');

const url = "https://example.com";
const docsDirectory ="/docs";

//Walker function to go through directory and subdirectories
var walk = function(dir, done) {
  var results = [];
  fs.readdir(dir, function(err, list) {
    if (err) return done(err);
    var pending = list.length;
    if (!pending) return done(null, results);
    list.forEach(function(file) {
      file = path.resolve(dir, file);

      fs.stat(file, function(err, stat) {

        if (stat && stat.isDirectory()) {
          walk(file, function(err, res) {
            results = results.concat(res);
            if (!--pending) done(null, results);
          });
        } else {
            if(path.extname(path.basename(file)) === ".md" && !path.basename(file).startsWith('_')){

                let cleanDir = path.dirname(file.replace(__dirname+docsDirectory, ''));

                if(cleanDir == '/'){
                    cleanDir = "";
                }

                console.log(cleanDir);

                let urlPath = url+cleanDir+"/"+path.basename(file).replace('.md',"");

                results.push({

                    // format the file to a valid URL
                    url: urlPath,

                    // Last modified time for google sitemap
                    lastModified: stat.ctime
                  });
            }

          if (!--pending) done(null, results);
        }
      });
    });
  });
};

walk('./docs', function(err, results){

    let feedObj = {
        urlset: {
            '@xmlns:xsi': "http://www.w3.org/2001/XMLSchema-instance",
            "@xmlns:image":"http://www.google.com/schemas/sitemap-image/1.1",
            "@xsi:schemaLocation":"http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd http://www.google.com/schemas/sitemap-image/1.1 http://www.google.com/schemas/sitemap-image/1.1/sitemap-image.xsd",
            "@xmlns":"http://www.sitemaps.org/schemas/sitemap/0.9",
            url:[]
        }
    }

    results.forEach((data, i)=>{
            feedObj.urlset.url.push({
                loc: data.url,
                lastmod: data.lastModified.toISOString()
            })
    })

    let sitemap = xmlbuilder.create(feedObj, { encoding: 'utf-8' });

    fs.writeFile("docs/sitemap.xml",sitemap,function(err){
        console.log(err)
        })

})

package.json

{
  "name": "Docsify sitemap generator",
  "version": "1.0.0",
  "description": "",
  "main": "sitemapGenerator.js",
  "directories": {
    "doc": "docs"
  },
  "dependencies": {
    "fs": "0.0.1-security",
    "path": "^0.12.7",
    "xmlbuilder": "^15.1.1"
  },
  "devDependencies": {},
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "repository": {
    "type": "git",
    "url": ""
  },
  "author": "",
  "license": "ISC"
}
sy-records commented 3 years ago

Use GitHub Actions to automatically generate a sitemap, the principle is to use git to get files from the docs directory, splicing url.

see https://github.com/lufei/notes/blob/master/.github/workflows/sitemap.yml and https://github.com/lufei/notes/blob/master/docs/sitemap.sh

waruqi commented 3 years ago

Use GitHub Actions to automatically generate a sitemap, the principle is to use git to get files from the docs directory, splicing url.

see https://github.com/lufei/notes/blob/master/.github/workflows/sitemap.yml and https://github.com/lufei/notes/blob/master/docs/sitemap.sh

But first you need to be able to generate static pages and fix the links, otherwise simply generating sitemap to index the links of dynamic pages does not seem to be of any practical help to SEO.

sy-records commented 3 years ago

I know. It worked when we fixed SSR.

shawaj commented 3 years ago

Is there a way to generate these at all now?

abadfox233 commented 3 years ago

I use Java to generate sitemap.xml

String bookPath =  "/var/books";

Element root=new Element("urlset");
Document doc=new Document();
doc.addContent(root);
Namespace namespace = Namespace.getNamespace("http://www.sitemaps.org/schemas/sitemap/0.9");
root.setNamespace(namespace);

SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss+08:00");
String rootPath = bookPath.endsWith("/")?bookPath: bookPath + "/";

Stack<File> fileStack =new Stack<>();
HashMap<String, String> urlMap = new HashMap<>();
List<Element> elements = new ArrayList<>();

String host = "http://book.ironblog.cn/#/";
File file = new File(rootPath);
fileStack.push(file);

while (!fileStack.isEmpty()){

    File topFile = fileStack.pop();
    if(topFile.isDirectory()){
        for(File element: Objects.requireNonNull(topFile.listFiles())){
            fileStack.push(element);
        }

    }else {

        String fileName = topFile.getName();
        String filePath = topFile.getAbsolutePath();
        filePath = filePath.replace("\\", "/");

        if(fileName.endsWith("md") && !filePath.contains("resources")
                && !fileName.equals("_sidebar.md") ){
            String url = URLEncoder
                    .encode(filePath.replace(rootPath, ""), "UTF-8")
                    .replace("%2F", "/")
                    .replace(".md", "");
            long l = topFile.lastModified();
            Date date = new Date(l);
            String dateStr = dateFormat.format(date);
            urlMap.put(host + url, dateStr);
        }

    }

}

for(String url:urlMap.keySet()){
    Element element=new Element("url", root.getNamespace());
    Element loc = new Element("loc", root.getNamespace());
    loc.addContent(url);

    Element lastmod = new Element("lastmod", root.getNamespace());
    lastmod.addContent(urlMap.get(url));

    element.addContent(loc).addContent(lastmod);
    elements.add(element);
   root.addContent(element);

}

XMLOutputter outter=new XMLOutputter();
outter.setFormat(Format.getPrettyFormat());

FileWriter fileWriter = new FileWriter(new File(rootPath + "sitemap.xml"));
outter.output(doc,fileWriter);
fileWriter.close();
}
ymc9 commented 1 year ago

Simple node.js script I'm using:

import { globbySync } from 'globby';
import { SitemapStream, streamToPromise } from 'sitemap';
import { Readable } from 'stream';
import fs from 'fs';

const links = [
    { url: '/', changefreq: 'daily' },
    ...globbySync(['./**/[!_]?*.md', '!node_modules', '!README.md']).map(
        (path) => ({
            url: `/${path.replace('.md', '')}`,
            changefreq: 'daily',
        })
    ),
];

console.log('Sitemap entries:');
console.log(links);

const stream = new SitemapStream({ hostname: process.env.SITE_HOSTNAME });
const content = (
    await streamToPromise(Readable.from(links).pipe(stream))
).toString('utf-8');

fs.writeFileSync('./sitemap.xml', content);
studeyang commented 1 year ago

python for it, see: generate_sitemap.py

import datetime
import os

url = 'https://studeyang.tech/technotes/#'
file_path = "./sitemap.xml"
exclude_files = [
    'coverpage', 'navbar', 'README', 'sidebar',
    'A/README', 'A/Python/README', 'A/Python/sidebar'
]

def create_sitemap():
    xml = '<?xml version="1.0" encoding="UTF-8"?>\n'
    xml += '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">\n'
    for path, dirs, files in os.walk("./"):
        for file in files:
            if not file.endswith('.md'):
                continue
            try:
                if not path.endswith('/'):
                    path += '/'
                new_path = (path.replace('\\', '/') + file)[2:-3]
                if new_path in exclude_files:
                    continue
                print(new_path)
                xml += '  <url>\n'
                xml += f'    <loc>{url}/{new_path}</loc>\n'
                lastmod = datetime.datetime.utcfromtimestamp(os.path.getmtime(path + file)).strftime('%Y-%m-%d')
                xml += f'    <lastmod>{lastmod}</lastmod>\n'
                xml += '    <changefreq>monthly</changefreq>\n'
                xml += '    <priority>0.5</priority>\n'
                xml += '  </url>\n'
            except Exception as e:
                print(path, file, e)
                break
    xml += f'</urlset>\n'

    with open(file_path, 'w', encoding='utf-8') as sitemap:
        sitemap.write(xml)

if __name__ == '__main__':
    create_sitemap()