bisohns / search-engine-parser

Lightweight package to query popular search engines and scrape for result titles, links and descriptions
https://search-engine-parser.readthedocs.io
455 stars 87 forks source link

Extract actual page URL from results (GoogleSearch) #128

Closed pgrandinetti closed 3 years ago

pgrandinetti commented 3 years ago

Hi--I am wondering if there's already a built-in way to extract the actual page URL from the results of the search.

Example: A GoogleSearch returns

https://google.com/url?q=https://www.simplyrecipes.com/recipes/spaghetti_alla_carbonara/&sa=U&ved=2ahUKEwiZz6bahe7uAhU9RDABHTISDroQFnoECAYQAg&usg=AOvVaw05E1NKVreZ6ImGI3IbvN9o

and the URL to use would be https://www.simplyrecipes.com/recipes/spaghetti_alla_carbonara/.

Is this already implemented? Thanks.

deven96 commented 3 years ago

This is currently not been implemented but should not be difficult using urlparse. Would you mind implementing it @pgrandinetti ?

pgrandinetti commented 3 years ago

@deven96 Sure I can give it a try. Can you please direct me about where this should go (what module/class) and what's a good name for it?

deven96 commented 3 years ago

Should go under search_engine_parser.core.google in the parse_url function. Editing that function to parse out the precise url should give us the behaviour we want @pgrandinetti

pgrandinetti commented 3 years ago

OK let's start with that, though I think it should be done for all engines that you have it. I didn't look at the others yet. @deven96

deven96 commented 3 years ago

Yeah let's go with this to begin with @pgrandinetti