egonSchiele / HandsomeSoup

Easy HTML parsing for Haskell
http://egonschiele.github.com/HandsomeSoup
BSD 3-Clause "New" or "Revised" License
124 stars 20 forks source link

Dashes and underscores in id and class attributes #4

Closed akamaus closed 12 years ago

akamaus commented 12 years ago

Hi, thanks for the package!

I've found what it refuses to parse many real world tags because your lexer is too strict. Take a look at http://www.w3.org/TR/CSS21/grammar.html

I made a quick fix for supporting dashes and underscores in id and class attributes.

mkrauskopf commented 12 years ago

Similarly for tag names which I've fixed with:

diff --git a/Text/CSS/Parser.hs b/Text/CSS/Parser.hs
index 5c6c775..2abd0c7 100644
--- a/Text/CSS/Parser.hs
+++ b/Text/CSS/Parser.hs
@@ -42,7 +42,7 @@ nmchar  = alphaNum <|> oneOf "_-"

 -- | selects a tag name, like @ h1 @
 typeSelector :: ParsecT [Char] u I.Identity [Char]
-typeSelector = many1 alphaNum
+typeSelector = many1 (alphaNum <|> oneOf "_-")

 -- | universal selector, selects @ * @
 universalSelector :: ParsecT [Char] u I.Identity String

Forking just for this seems to me as overkill. So attaching to this related issue. Hopefully OK.

egonSchiele commented 12 years ago

Thanks!