axitkhurana / buster

Brute force static site generator for Ghost
MIT License
800 stars 139 forks source link

"index.html" is being appended to all directory links #16

Closed shaunlebron closed 10 years ago

shaunlebron commented 10 years ago

This is a cosmetic URL problem, a side effect of #5. (problem discussed on super user)

We can either:

  1. build a custom version of wget that chops off "index.html" from the converted-url here)
  2. modify buster.py to correct the links in HTML files after generating.
  3. inject extra javascript into the "assets/js/index.js" to correct the links on client side.
axitkhurana commented 10 years ago

Hey Shaun,

Thanks for reporting this. I feel the best way would be to modify buster.py to correct links in HTML files after generation. This would be a part of the generate command.

Maintaining a fork of wget will not be an easy task and would add more complexity to the buster installation. I won't be able to find time to work on this until next week. Feel free to send a pull request if you would like to.

shaunlebron commented 10 years ago

A simple text-based replacement could potentially modify more than intended (e.g. code examples in blog content). So, we should probably use PyQuery to safely parse/search/modify an html tree.

from pyquery import PyQuery
abs_url_regex = re.compile(r'^(?:[a-z]+:)?//', flags=re.IGNORECASE)
def fixLinks(text):
    d = PyQuery(text, parser='html')
    for element in d('a'):
        e = PyQuery(element)
        href = e.attr('href')
        if not abs_url_regex.search(href):
            new_href = re.sub(r'/index\.html$', '/', href)
            e.attr('href', new_href)
    return d.__unicode__().encode('utf8')

alternative

Adding the following code to assets/js/index.js would do the same thing:

/* Buster: correct local directory links to not include "index.html" */
(function($){
    var absUrlPattern = new RegExp('^(?:[a-z]+:)?//', 'i');
    $(document).ready(function(){
        $('a').each(function(){
            var e = $(this);
            var href = e.attr('href');
            if (!absUrlPattern.test(href)) {
                var new_href = href.replace(/\/index\.html$/, '/');
                e.attr('href', new_href);
            }
        });
    });
}(jQuery));
martgnz commented 10 years ago

Any news on this? How can I apply the solutions?

shaunlebron commented 10 years ago

Solution found in pull request on #20

fluke commented 10 years ago

@axitkhurana Is this going to be implemented into buster? I had a look at your blog and the links are fine. So is it just a matter of not updating your latest changes?