fefit / visdom

A library use jQuery like API for html parsing & node selecting & node mutation, suitable for web scraping and html confusion.
MIT License
109 stars 6 forks source link

Does this support xpath selectors ? #19

Open mdrokz opened 1 year ago

mdrokz commented 1 year ago

Hi i wanted to know if this supports xpath selectors like this "//h1[contains(text(),'Search Results')]/following-sibling::div[1]/div" ?

fefit commented 1 year ago

@mdrokz sorry for the late reply, this crate does not support xpath selectors. If this requirement is common, I plan to take the time to add this feature recently.

mdrokz commented 1 year ago

@mdrokz sorry for the late reply, this crate does not support xpath selectors. If this requirement is common, I plan to take the time to add this feature recently.

Hey thanks for the reply it will be really helpful to scrape websites that have randomly generated class names and ID. I can help out in implementing the feature If you can guide me thanks.

fefit commented 1 year ago

@mdrokz I have added a new feature branch to support xpath selectors. The query methods in this crate used to only accept the type &str as a selector parameter, so i add a new trait TryIntoSelector to allowed more types that implement the trait as a selector too. However, the whole logic of query methods is based on css selectors, it is not easy to add the processing logic of xpath selector. I think a simple but not so efficient way is to convert the xpath selectors into the corresponding css selectors, a small number of xpath selectors may not have corresponding css selectors, this may require expanding the capabilities of css selectors. It will takes a lot of work to fully support xpath selectors, if you have a better solution, we can discuss it here.

Thank you very much for your willingness to contribute code to this crate. You can fork the repo, and checkout a new branch from the feature branch, add code to support xpath selectors and also some unit tests code, then make a PR.

I'm worried that it may take up a lot of your time to implement this feature. If you don't have much time on this, you can also tell me, i can do some of the work together. Thanks again!

mdrokz commented 1 year ago

@mdrokz I have added a new feature branch to support xpath selectors. The query methods in this crate used to only accept the type &str as a selector parameter, so i add a new trait TryIntoSelector to allowed more types that implement the trait as a selector too. However, the whole logic of query methods is based on css selectors, it is not easy to add the processing logic of xpath selector. I think a simple but not so efficient way is to convert the xpath selectors into the corresponding css selectors, a small number of xpath selectors may not have corresponding css selectors, this may require expanding the capabilities of css selectors. It will takes a lot of work to fully support xpath selectors, if you have a better solution, we can discuss it here.

Thank you very much for your willingness to contribute code to this crate. You can fork the repo, and checkout a new branch from the feature branch, add code to support xpath selectors and also some unit tests code, then make a PR.

I'm worried that it may take up a lot of your time to implement this feature. If you don't have much time on this, you can also tell me, i can do some of the work together. Thanks again!

Hey @fefit thank you for taking your time to work on this feature, i will fork the project and checkout the branch when i get some time im currently working a full time job so i will be doing this on my spare time i will let you know if i encounter any issues or need your help once i see the branch. Thanks!