ashbythorpe / selenider

Concise, Lazy and Reliable Wrapper for 'chromote' and 'selenium'
https://ashbythorpe.github.io/selenider/
Other
36 stars 2 forks source link
r web-scraping

selenider

R-CMD-check Codecov test
coverage CRAN
status

Traditionally, automating a web browser is often unreliable, especially when using R. Programmers are forced to write verbose code, utilising inconsistent workarounds (such as using Sys.sleep() to wait for something to happen).

selenider aims to make web testing and scraping in R much simpler, providing a wrapper for either chromote or selenium. It is inspired by Java’s Selenide and Python’s Selene.

Code reliability and reproducibility are essential when writing R code. selenider provides features to make your scripts work every time they are run, without any extra code:

selenider’s other main focus is its API. Its design choices result in concise yet expressive code that is easy to read and easy to write:

Installation

# Install selenider from CRAN
install.packages("selenider")

# Or the development version from Github
# install.packages("remotes")
remotes::install_github("ashbythorpe/selenider")

Additionally, you must install chromote or selenium. We recommend chromote, as it is quicker and easier to get up and running.

# Either:
install.packages("chromote")

# Or:
install.packages("selenium")

If you are using selenium, you must also have Java installed.

Finally, you must have a web browser installed. For chromote, Google Chrome is required. For selenium, any browser can be used, but Firefox is recommended.

Usage

library(selenider)

The following code navigates to the R project website, finds the link to the CRAN mirror list, checks that the link is correct, and clicks the link element.

open_url("https://www.r-project.org/")

s(".row") |>
  find_element("div") |>
  find_elements("a") |>
  elem_find(has_text("CRAN")) |>
  elem_expect(attr_contains("href", "cran.r-project.org")) |>
  elem_click()

Now that we’re in the mirror list page, let’s find the link to every CRAN mirror in the UK.

s("dl") |>
  find_elements("dt") |>
  elem_find(has_text("UK")) |>
  find_element(xpath = "./following-sibling::dd") |>
  find_elements("tr") |>
  find_each_element("a") |>
  elem_expect(has_at_least(1)) |>
  as.list() |>
  lapply(
    \(x) x |>
      elem_attr("href")
  )
#> [[1]]
#> [1] "https://www.stats.bris.ac.uk/R/"
#>
#> [[2]]
#> [1] "https://cran.ma.imperial.ac.uk/"
#>
#> [[3]]
#> [1] "https://anorien.csc.warwick.ac.uk/CRAN/"

Vignettes