cantino / selectorgadget

Go go CSS / DOM inspection.
http://www.selectorgadget.com
MIT License
1.02k stars 184 forks source link

Selector h4+ p is incorrectly translated to xpath #28

Open jstray opened 5 years ago

jstray commented 5 years ago

On the page https://www.kpu.ca/calendar/2018-19/courses/jrnl/index.html, I'm trying to select the paragraph of course description that follows each course title. For example, "Students will explore how journalism fits in a media landscape..."

I can successfully highlight the appropriate elements in SG by clicking on this paragraph, then clicking on one of the "Prerequisites" elements to prevent them from being included. This results in the correct CSS selector h4+ p

However, when I translate this to an Xpath, I get //h4+//p which is not correct. I would expect this to translate to something like //h4/following::p[1], which gives the correct result.

We have been advising people to use SelectorGadget to write the expressions for the Xpath Extractor in Workbench (http://help.workbenchdata.com/steps/scrape/xpath-extractor) as a way to avoid learning the xpath syntax, so it's unfortunate that this case is mis-translated.

cantino commented 5 years ago

Hey @jstray, thanks for the bug report! I'm not very actively maintaining Selector Gadget these days. I'll mark this as help wanted, and hopefully someone will send in a PR. You're also more than welcome to submit a fix. I don't think it'd be too hard.