Open Lyokovic opened 6 years ago
Yes, it should be fairly straightforward. One would have to:
Extend the grammar of selectors with one more level: https://github.com/aantron/lambda-soup/blob/8084d5b86ce8f1223271fc1e67398ac618dacbda/src/soup.ml#L489
simple_selector
is stuff like .class-foo
, [attribute-bar]
, combinators are >
, +
, etc. So, this grammar is capable of representing things like .class-foo > [attribute-bar]
. It needs one more level of list
to be able to represent comma-separated lists of these.
This is the parser top-level function. It needs to be modified to become not the top-level function, but a parser for a single item delimited by ,
, and then a new top-level function needs to wrap it, that reads commas, and calls the current parser for reading everything in between. https://github.com/aantron/lambda-soup/blob/8084d5b86ce8f1223271fc1e67398ac618dacbda/src/soup.ml#L896-L913
This is the select code. Its logic needs to be wrapped in a new top-level loop that tries additional selectors from the new top-level list
if the preceding ones didn't yield a match. https://github.com/aantron/lambda-soup/blob/8084d5b86ce8f1223271fc1e67398ac618dacbda/src/soup.ml#L611-L647
Thanks, I'll take a look ASAP.
Hi,
I started using Lambda Soup and found that it does not seems to support selector lists, like
".bg1, .bg3"
. I need to parse an HTML document with various<div>
withbg2
bg1
bgbc
bg3
classes and want to keep only thebg1
andbg3
ones while keeping the order.I am wondering if it would be easy to implement this feature?