RLesur / crrri

A Chrome Remote Interface written in R
https://rlesur.github.io/crrri/
Other
157 stars 12 forks source link

Allow to load user profile in non-headless mode #90

Open cderv opened 4 years ago

cderv commented 4 years ago

This is from a question by @yonicd about crrri being able to load a session with user credential and extension.

From a little search, I think this is possible in non-headless mode, and maybe in headless mode for credentials (but not sure).

What I tried is just using the User Profile Directory of my chrome browser. With chrome you can to that with --user-data-dir. In crrri, I think for security reason, we just create a new work dir per session, and remove when closing.

We could maybe offer an option for user to opt-in in a persisent user Profile. The use case I see:

What we must take care of:

I don't know if it will be enough for your usage @yonicd but it seems normal that crrri allow that.

Test using internal non exported functions ```r library(crrri) # Launch chrome with my user profile # When browser opens I can see my extensions chrome <- crrri:::chr_launch( bin = "C:/Program Files (x86)/Google/Chrome/Application/chrome.exe", debug_port = 9222L, extra_args = NULL, headless = FALSE, work_dir = "C:/Users/chris/AppData/Local/Google/Chrome/User Data" ) session <- CDPRemote$new( host = "localhost", debug_port = 9222L, secure = FALSE, local = FALSE, retry_delay = 0.2, max_attempts = 15L ) client <- session$connect(callback = ~ .x$inspect()) Page <- client$Page # I connected to the community before in this profile # so i should be already connected in inspector mode too. Page$navigate(url = "http://community.rstudio.com/") # You can check that credentials are available in # chrome://settings/passwords # in the open browser ```
yonicd commented 4 years ago

That should be great for me. Thanks!

I am trying to use a service called paperpile that forces you to be logged in with chrome user and have their extension installed to use their site...

I am trying to drive their website in order to set up a chron activity to store an updated file every night.

yonicd commented 4 years ago

hmmm

this is close with RSelenium, i can get the extension to load, but my --user-data-dir is being ignored

c_opts <- list(
  args  = c("--disable-gpu",
            "--window-size=1280,800",
            "--user-data-dir=~/Library/Application Support/Google/Chrome",
            "--load-extension=~/Library/Application Support/Google/Chrome/Default/Extensions/bomfdkbfpdhijjbeoicnfhjbdhncfhig/1.5.137_0"),
  prefs = list(
    "profile.default_content_settings.popups" = 0L,
    "download.prompt_for_download" = FALSE,
    "download.directory_upgrade" = TRUE,
    "safebrowsing.enabled" = TRUE,
    "download.default_directory" = tempdir()
  )
)

rD <- RSelenium::rsDriver(
  browser = "chrome",
  verbose = TRUE,
  port = 1324L,
  check = TRUE,
  extraCapabilities = list(
    chromeOptions = c_opts
  ),
)
cderv commented 4 years ago

I never tried with RSelenium. However, it seems with my current test script that the user data dir is not accessible in headless mode. 🤔 You could try in non-headless mode to see if it differs.

It also possible that loading the Default profile is not allowed in headless mode. By security. One solution could be to create a new profile for your usage and use this chrome custom profile where you would have installed extension and all necessary for your usage. Do you see what I mean ?

Some references to research on this topic

cderv commented 4 years ago

However, it seems with my current test script that the user data dir is not accessible in headless mode.

I can confirm that I don't get the same behavior in headless and headful mode. not cool... 😞

cderv commented 4 years ago

I don't succeed to persist cookie between headful where I create it and headless when I try to access it. Using the same data dir, it works fine when creating a cookie in headful, closing everything, and reading again in headful.

@yonicd I think we would need to be sure that what you want to do is ok with chrome headless first, to see how to implement it in crrri. There may be something I am missing here 🤔

yonicd commented 4 years ago

headless mode is funky. there is also a weird bug that wont let you set the download.default_directory.

I got the RSelenium version to work with my user profile with the same setting i had in the comment above (/shrug)

this is the RSelenium solution to my problem... (stupid site)

c_opts <- list(
  args  = c(
    "--disable-gpu",
    "--window-size=1280,800",
    "--user-data-dir=~/Library/Application Support/Google/Chrome",
    "--load-extension=~/Library/Application Support/Google/Chrome/Default/Extensions/bomfdkbfpdhijjbeoicnfhjbdhncfhig/1.5.137_0"),
  prefs = list(
    "profile.default_content_settings.popups" = 0L,
    "profile.content_settings.exceptions.clipboard" = 1L,
    "download.prompt_for_download" = FALSE,
    "download.directory_upgrade" = TRUE,
    "safebrowsing.enabled" = TRUE,
    "download.default_directory" = tempdir()
  )
)

rD <- RSelenium::rsDriver(
  browser = "chrome",
  verbose = FALSE,
  port = 1324L,
  extraCapabilities = list(
    chromeOptions = c_opts
  ),
  check = FALSE
)

# navigate
  rD$client$navigate('https://paperpile.com/app/shared/YEeG5y')

#select all

  rD$client$executeScript('document.querySelector("#selectionButton-1020-btnIconEl").click();')

#copy bib

  el <- rD$client$findElement(using = 'css','body')
  el$sendKeysToElement(sendKeys = list(RSelenium::selKeys$command_meta,'b'))

# wait for it to copy  
  wait <- TRUE
  i <- 0
  while(wait){
    wait <- rD$client$executeScript('return document.querySelector(".pp-status-text").innerText')[[1]]!="Copying Bibtex citations"
    Sys.sleep(3 + i/2)
    i <- i + 1
  }

  expectation <- as.numeric(gsub('[^0-9]','',rD$client$executeScript('return document.querySelector(".pp-status-text").innerText')[[1]])) - 1

# write clipboard to local file

  tf <- tempfile(fileext = '.bib')
  cat(clipr::read_clip(),file = tf,sep = '\n')

# vallidate bib  
  pp <- paperpile::parse_bib(path = tf)

  if(length(pp)!=expectation)
    message('number of citations mismatch')

rD$client$closeall()
rD$server$stop()
yonicd commented 4 years ago

i see why it is working... chrome is copying into my pwd a dir called ~ with my chrome profile.... that is a weird action

cderv commented 4 years ago

Just a note about extensions: it can't be used in headless mode ! See https://github.com/puppeteer/puppeteer/blob/master/docs/api.md#working-with-chrome-extensions