local url transforms - Githubissues

wichert commented 12 years ago

When generating an Atom (or RSS) feed it is useful to be able to use tracking marking in URLs. For example if you use Google Analytics on your site you want to use tracking URLs in a feed so you can detect people coming in via a feed. This requires the tracking bits in two places: the URLs for pages in the feed, and in the URLs retrieved from content that is inserted in the feed. The latter requires running a transform on inserted blocks to modify local URLs (the local part is important here!).

colinta commented 12 years ago

What would these URLs look like? tack a ?query=string to the end? Since we're dealing with static files, there aren't many options other than that... in which case, is there a need for something more than the low-tech {{ page.url }}?from=atom?

wichert commented 12 years ago

just a query string at the end indeed, which is picked up by javascript.

The low-tech solution does not work since it doesn't work as a filter. Consider a feed like this:

  <entry>
    <link>{{ page.url }}?utm_campaign=planet-xyz&utm_medium=feed</link>
    <content>
        {{ article.body|add_tracking(utm_campaign='planet-xyz', utm_medium='feed') }}
    </content>
  <entry>

The for <link> element the low-tech solution works, but it does not work for the content-part: that reuses content that must not use tracking URLs when rendered on its own page, hence the need for a transform.

The above syntax is actually pretty nice: it is easily implementable as a jinja filter.

colinta commented 12 years ago

This opens up a whole other ball of wax, which is that there is no automatic way of getting article.body out of the template. I've been using summary entries in the front matter up till now, which works well for that, but a feed should have access to the "main" content, which jinja doesn't really have a concept of. By convention, the "main" content is called {% block content %}, but that is not actually any more special that a breadcrum block, or any other block.

A page processor might be able to ask jinja to get the {% block content %} (or any other block - default block would be "content") from the template without using the layout, and ignoring other blocks. Alternatively, a nasty regex could grab the block contents:

main_block = "content"
re.compile(r'''
\{%\s*                 # open tag
  block\s+{block}      # block content
\s%\}                  # close tag
(?P<contents>.*)       # inside the block
\{%\s*
  endblock\s+{block}   # include the block name here, too.
\s%\}'''.format(block=main_block), re.DOTALL | re.VERBOSE)

Other site generators get around this by using .md files, or by automatically inserting the pages into a layout. This is not the approach I want to take with strangecase. It's not supposed to be a "blog engine", but a generic site generator. I'll have to ponder this one!

wichert commented 12 years ago

This isn't especially high on my priority list - consider it more of a nice-to-have :).

A regexp to grab {% block content %} feel dangerous: you would need to run the whole thing through jinja again including same local configuration to get the same output. Probably easier to require a user to provide a css selector or xpath expression to select the right content. This does assume the page you are referencing can already be renderer, so you need to be careful to detect cycles of pages trying to include each other.

colinta commented 12 years ago

That's a great idea - then we can grab that html node via beautiful soup!

i'll see what I can whip up (not sure when)

colinta / StrangeCase

local url transforms #17