davidmerfield / Typeset

An HTML pre-proces­sor for web ty­pog­ra­phy
https://typeset.lllllllllllllllll.com/
Creative Commons Zero v1.0 Universal
2.65k stars 54 forks source link

Cheerio Dependency #4

Closed jakiestfu closed 9 years ago

jakiestfu commented 9 years ago

How integral is cheerio to this problem you're solving? What is it's biggest benefit?

davidmerfield commented 9 years ago

Cheerio provides a neat, fast, browser-free interface to parse an HTML string and modify its text nodes. When I adapt the library to run on the client, I'll use jQuery, or a minimal equivalent.

Do you have a suggestion for an alternate method?

jakiestfu commented 9 years ago

Your library is very useful but not everyone needs the text nodes parsed, they just want content in -> content out, If I were to have built this I would have just omitted it and them injecting their formatted content in their HTML is still up to them.

I guess I feel like it's an unnecessary addition, but if it's something you definitely wanted in the library, thats ok!

davidmerfield commented 9 years ago

I'm struggling to work out how to implement some of the library's features without parsing the HTML. For instance, a naive find-and-replace on an HTML string would screw up pre-formatted content, tag attributes, script tags, inline CSS etc. The punctuation replacement and hanging punctuation needs to be limited to specific text nodes.

Can you think of a way of accomplishing this without some sort of HTML parser?

jakiestfu commented 9 years ago

IMO, the real solution is to not attempt to parse HTML to begin with. Content !== HTML. With the example below, the user could just convert the text and inject it into the DOM however. Alternatively, they could just send the content down from the server if this is used server side.

var typeset = require('typeset');
var text = '"Hello," said the fox.';
var output = typeset(text);

// Client
document.getElementById('content').innerText = output;

// OR Server
res.send('my-template.jade', {
  content: output
});

This means it could work server side and client side, all it does is take raw text and format it as you'd expect. It should still be up to the user to inject that content into their setup however they see fit.

jakiestfu commented 9 years ago

This will work with your current implementation of Typeset, but it will parse HTML in addition. I think it's feature creep in the sense that it's optionally imposing the use of HTML or it uses cheerio unnecessarily.

If you simplify the concept of this library to just be straight text replacement for those characters, it'd be useful in much more places.

jakiestfu commented 9 years ago

Again, only if you think that concept is relevant. I could see my company using this but we use Ember.js for example, and we don't want to inject HTML, just raw text content and we handle data binding ourselves. Again, the current form of this lib allows us to do that, but that just means Cheerio is unnecessary at that point.

davidmerfield commented 9 years ago

Agreed – this would be great. However, what if someone wants to mix in some code snippets into their content? Or embed a data visualization? Or video? It's going to involve mixing text & HTML. Perhaps the solution is to pass an option if you only have text, say typeset(text, {text: true}) that bypasses cheerio and gives you the resulting performance benefit?

The library was designed to be another stage in the asset pipeline for my blogging platform, so being able to parse an entire HTML document was a requirement.

jakiestfu commented 9 years ago

I would say that person is "doing it wrong" then but you obviously have to account for as many people as possible if you want better exposure.

I guess that idea where you have an API that supports both would work. It'd at least make it clear that you're not using a virtual DOM parser if you don't want that.

davidmerfield commented 9 years ago

The other thing to be aware of is that the features which enable hanging punctuation and optical-margin-alignment must return HTML, since the technique involves the insertion of new nodes.

One possible alternative for people in your situation might be to use this library on the client side. This is not a feature I have yet added but it is planned

jakiestfu commented 9 years ago

If I were to use this in a personal application, I'd probably prefer to send the data formatted from the server, but with this solution, everybody wins.

You're right about the hanging punctuation, so maybe some formatting can only be done in HTML.

I just wanted to share the idea is all! Great library, very useful. Keep up the good work!

davidmerfield commented 9 years ago

Cool – I do appreciate your questions and thanks for the kind words

sirakoff commented 9 years ago

I recommend Sprint-js for the client-side implementation.