gesistsa / webbotparseR

:mag: R package to parse search engine results
https://gesistsa.github.io/webbotparseR/
Other
8 stars 1 forks source link

Error for `parse_search_results()` with own data #15

Open jobreu opened 6 months ago

jobreu commented 6 months ago

Thx for this really helpful package :-)

I just tried it out: Parsing the example data that comes with the package works fine for me.

However, wen I try running parse_search_results() with some test data I collected using WebBot, I always get an error that an argument has length 0 (Error in if (is.na(elem)) { : argument is of length 0).

Anonymized reprex:

library(webbotparseR)

output <- parse_search_results(path = "Drive:/Users/Me/Downloads/webbot/www.google.com_My_term_text_2024-03-22_14_15_16.html",
                               engine = "google text")

Note: I'm working with R v 4.3.2 on Win 10 (in case that matters for this issue).

schochastics commented 6 months ago

can you share the html? Then I could do some debugging. Via email is fine or here if it is nothing sensitive

jobreu commented 6 months ago

Thanks! GitHub does not let me upload the HTML here, so I will send you an e-mail.

schochastics commented 5 months ago

@jobreu As I feared, google changed the result pages. We have a mechanism to switch to new css selectors. I fixed that, but they also removed the paging and the search pages are now infinity scroll. This needs to be fixed in [gesiscss/webBot]

You should be able to read your html file now, but it only contains 20 results

jobreu commented 5 months ago

Thank you for checking this, the fix, and the explanation!