cognifloyd / Cognifire.BuilderFoundation

This is the old repository. Cognifire.BuilderFoundation is now Cognifire.Filefish
https://github.com/cognifloyd/Cognifire.Filefish
1 stars 0 forks source link

Investigate using Fizzle for selection in HTML Blobs #18

Open cognifloyd opened 11 years ago

cognifloyd commented 11 years ago

In HTML Blobs (typically subBlobs of Fluid Blobs when using the TemplateBuilder, I'm going to need a way to select elements in the syntax tree.

Fizzle is the standard for FlowQuery (which is what I'm using), but it does not have some of the convenience syntax that is so common in Css. The most important of which are:

I guess fizzle properties would map onto tags or elements, but I don't know how that works. I also need to be able to select Blobs based on their package and file path.

Here are some specific things to look into:

I think I'll have to use XPath, but if someone wants to write their own CSS Selector, I'll support that using symfony\CssSelector. For anything a builder writes to a file, unless it is user provided, it will be XPath. Hopefully XPath isn't hard to generate in JavaScript because this selector will be something that will go back and forth between the UI and the backend.

cognifloyd commented 11 years ago

OK. So FlowQuery uses Fizzle in the Filter operation, as I expected.

It passes the argument string into fizzle, and gets an array (a syntax tree) back.

Fizzle already has support for the #ObjectIdentifier notation, but an ObjectIdentifier is defined as: [0-9a-zA-Z_-]+. That could work in place of [id=blah]. I'll have to make a new filter that extends \TYPO3\Flow\Eel\FlowQuery\Operations\Object\FilterOperation, and redefine matchesIdentifierFilter() which currently expects the identifier to be a UUID, and uses the UUID to select the element (object in the collection) that is identified by that UUID. So, that will need to look through whatever HTML syntax tree (DOMElements probably) and find the one that has the given identifier.

See TYPO3.Neos:TypoScript\FlowQueryOperations\FilterOperation for inspiration.

cognifloyd commented 11 years ago

I'll probably use matchesPropertyNameFilter() to filter the tag name, I think. The only issue is, I might want to get a namespaced tag (to select a fluid tag like <f:render/>, for example), and that means I'd need a colon.

So, I might have to extend ObjectIdentifier to be [0-9a-zA-Z_-]+(':'[0-9a-zA-Z_-])?

Speaking of the colon, in BlobQuery, I'm going to want to filter based on PackageName:path/path/path which means I also need a /

cognifloyd commented 11 years ago

In the attribute Filter [ foo = 'example' ], foo can be a property like foo.bar.baz which does not make sense in HTML. Attributes should all be at the same level. So, I'll want to override getPropertyPath() to just return the whole property...

Then again, could I use the propertyNameFilter to get the class definition? So div.foo would be seen as a property, but I would provide the semantic meaning that div is a tagname and foo is a class. Would that work?

It is also possible to use symfony/CssSelector instead of Fizzle, but I'd really like to avoid adding another dependency. Hopefully fizzle will work.

cognifloyd commented 11 years ago

So the propertyNameFilter expects an identifier which is defined as: [a-zA-Z_] [a-zA-Z0-9_]*

And the only place that it looks for a property path is in the attribute filter, but it doesn't match a period . in the Identifier, so I don't see how it will ever match a path unless there's some string magic somewhere that converts a period into an underscore and back again before checking for a property path.

cognifloyd commented 11 years ago

The docs[1] say that 'foo.bar.baz' would be a valid property name, but the parser grammar doesn't accept periods in property names[2].

Plus, there are no unit tests that include a period in the property name[3], even though there are some method stubs in filter() that seem to expect that a propertyName can include periods.

(1) http://docs.typo3.org/neos/TYPO3NeosDocumentation/IntegratorGuide/EelFlowQuery.html#property-name-filters

(2) The grammar expects an Identifier https://git.typo3.org/Packages/TYPO3.Eel.git/blob/HEAD:/Resources/Private/Grammar/Fizzle.peg.inc#l52 and an identifier is defined here (line 41): https://git.typo3.org/Packages/TYPO3.Eel.git/blob/HEAD:/Resources/Private/Grammar/AbstractParser.peg.inc#l41 which matches "/ [a-zA-Z] [a-zA-Z0-9]* /" <-- There is no period in this regex

(3) https://git.typo3.org/Packages/TYPO3.Eel.git/blob/HEAD:/Tests/Unit/FlowQuery/FizzleParserTest.php#l68 Note line 68 propertyNameFilterIsMatched() asserts that two things don't match: \TYPO3\Foo TYPO3.Foo:Bar But does not verify that anything does match. I think that means we need to match, 'foo', 'foo.bar', and 'foo.bar.baz' like the documentation suggests. It would also be very nice (for me anyway) to match these two examples that it says don't match because I need to filter based on packageName:path in one instance, and I need to filter based on tag#identity.class in another instance. So, that would suggest that I need to add ':.#' to the matched characters for propertyNameFilter.

cognifloyd commented 11 years ago

So, I would have to do some major voodoo with Fizzle to get it to understand tag#id.class

Maybe I should just bite the bullet and include symfony\CssSelector and then implement a new filter() operation that is only used for HTML, but takes the css selector and passes it on to DomCrawler to get the right spot in the file.

cognifloyd commented 11 years ago

symfony\DomCrawler isn't really the best tool for generating HTML. It's designed to retrieve and navigate it, so that you can submit forms, but I would have to build a bunch of stuff around it to make it work the way I need it to (read and write html files, as well as the html in fluid files).

Other options include:

My requirements include:

cognifloyd commented 11 years ago

TemplaVoila suffers from NIH-syndrome. There's are elements of CSS selectors (like #id and .class) but it uses a custom [number] annotation that is unique to TemplaVoila, as well as the keywords INNER and OUTER to see whether or not to include a matched tag. I really don't want to go down the same path as TemplaVoila, and contorting Fizzle to select HTML elements would do exactly that. No, I will use an external library, and I will use standard XPath and/or CSS Selectors.

The question is, is there an equivalent to INNER/OUTER in CSS or in XPath?

cognifloyd commented 11 years ago

CSS Selctors can select elements but not the contents of those elements. The closest we get to selecting the contents of an element is E::first-line, E::first-letter, E::before, and E::after.

I will want something like before and after, but I think I need even more power.

So, to map TV concepts onto XPath:

See the specs

cognifloyd commented 11 years ago

Just to follow up. I investigated the various parsers mentioned earlier and QueryPath is the best for what I need.

It's faster than SimpleHTMLDOM[1], is designed for editing unlike DomCrawler[2], and is more actively maintained that PhpQuery and in 2010 it was faster than PhpQuery at write operations[3].

Also, support for HTML5 (and especially HTML5 fragments) is underway in QueryPath 3.x, so it really is the best choice.

That means that, for the most part, CSS Selectors are the way to go for selecting elements in the docs. I can use xpath if needed, and maybe someone will add an xquery operation at some point, but for now, CSS Selectors through QueryPath is what I'm going to use.

[1] https://groups.google.com/forum/#!topic/support-querypath/DEQIsoZW_pU [2] http://symfony.com/doc/current/components/dom_crawler.html (see the first note at the top of the doc) [3] http://web.archive.org/web/20100815061227/http://www.tagbytag.org/articles/phpquery-vs-querypath