18F / jekyll_pages_api

a Jekyll Plugin that generates a JSON file with data for all the Pages in your Site
Other
43 stars 14 forks source link

How to avoid problematic characters such as ’ in converted json? #16

Closed tomjoht closed 9 years ago

tomjoht commented 9 years ago

Do you know how to avoid characters that get munged in the conversion, such as a curly apostrophe in "user's" getting converted to user’s?

afeld commented 9 years ago

Some encoding issue, and who the hell knows how to fix that. :football: :runner:

tomjoht commented 9 years ago

I can find all curly quotes and replace them straight quotes. I'm not even sure where the curly stuff came from, honestly.

One thing I'm unsure about is how to grab links. If the page has a link like this:

<a href="http://google.com">google.com</a>

When the pages become json, all that carries over is google.com unless I use character codes for the angle brackets. How can I pass all of the HTML tags through JSON?

afeld commented 9 years ago

The original use case for this plugin was to support search, so I very intentionally stripped out HTML. If you need the markup, I'd say to follow the url and request the pages themselves.

afeld commented 9 years ago

@tomjohnson1492 I don't think there's an encoding issue for this gem... just added a test in #18. Looked into your particular issue a bit, and it seems that your HTML might be invalid:


screen shot 2015-02-10 at 11 50 15 pm


http://idratherbewriting.com/wp-content/apidemos/docasapi/about/

Not sure if this was intentional, to test it? Not sure where that's getting converted to invalid(?) characters in this gem, but not sure what the ideal result would be, either. I understand very little about encoding anyway, so would love advice from anyone that stumbles upon this thread.