Open cderv opened 4 years ago
That should be great for me. Thanks!
I am trying to use a service called paperpile that forces you to be logged in with chrome user and have their extension installed to use their site...
I am trying to drive their website in order to set up a chron activity to store an updated file every night.
hmmm
this is close with RSelenium, i can get the extension to load, but my --user-data-dir
is being ignored
c_opts <- list(
args = c("--disable-gpu",
"--window-size=1280,800",
"--user-data-dir=~/Library/Application Support/Google/Chrome",
"--load-extension=~/Library/Application Support/Google/Chrome/Default/Extensions/bomfdkbfpdhijjbeoicnfhjbdhncfhig/1.5.137_0"),
prefs = list(
"profile.default_content_settings.popups" = 0L,
"download.prompt_for_download" = FALSE,
"download.directory_upgrade" = TRUE,
"safebrowsing.enabled" = TRUE,
"download.default_directory" = tempdir()
)
)
rD <- RSelenium::rsDriver(
browser = "chrome",
verbose = TRUE,
port = 1324L,
check = TRUE,
extraCapabilities = list(
chromeOptions = c_opts
),
)
I never tried with RSelenium. However, it seems with my current test script that the user data dir is not accessible in headless mode. 🤔 You could try in non-headless mode to see if it differs.
It also possible that loading the Default profile is not allowed in headless mode. By security. One solution could be to create a new profile for your usage and use this chrome custom profile where you would have installed extension and all necessary for your usage. Do you see what I mean ?
Some references to research on this topic
However, it seems with my current test script that the user data dir is not accessible in headless mode.
I can confirm that I don't get the same behavior in headless and headful mode. not cool... 😞
I don't succeed to persist cookie between headful where I create it and headless when I try to access it. Using the same data dir, it works fine when creating a cookie in headful, closing everything, and reading again in headful.
@yonicd I think we would need to be sure that what you want to do is ok with chrome headless first, to see how to implement it in crrri. There may be something I am missing here 🤔
headless mode is funky. there is also a weird bug that wont let you set the download.default_directory
.
I got the RSelenium version to work with my user profile with the same setting i had in the comment above (/shrug)
this is the RSelenium solution to my problem... (stupid site)
c_opts <- list(
args = c(
"--disable-gpu",
"--window-size=1280,800",
"--user-data-dir=~/Library/Application Support/Google/Chrome",
"--load-extension=~/Library/Application Support/Google/Chrome/Default/Extensions/bomfdkbfpdhijjbeoicnfhjbdhncfhig/1.5.137_0"),
prefs = list(
"profile.default_content_settings.popups" = 0L,
"profile.content_settings.exceptions.clipboard" = 1L,
"download.prompt_for_download" = FALSE,
"download.directory_upgrade" = TRUE,
"safebrowsing.enabled" = TRUE,
"download.default_directory" = tempdir()
)
)
rD <- RSelenium::rsDriver(
browser = "chrome",
verbose = FALSE,
port = 1324L,
extraCapabilities = list(
chromeOptions = c_opts
),
check = FALSE
)
# navigate
rD$client$navigate('https://paperpile.com/app/shared/YEeG5y')
#select all
rD$client$executeScript('document.querySelector("#selectionButton-1020-btnIconEl").click();')
#copy bib
el <- rD$client$findElement(using = 'css','body')
el$sendKeysToElement(sendKeys = list(RSelenium::selKeys$command_meta,'b'))
# wait for it to copy
wait <- TRUE
i <- 0
while(wait){
wait <- rD$client$executeScript('return document.querySelector(".pp-status-text").innerText')[[1]]!="Copying Bibtex citations"
Sys.sleep(3 + i/2)
i <- i + 1
}
expectation <- as.numeric(gsub('[^0-9]','',rD$client$executeScript('return document.querySelector(".pp-status-text").innerText')[[1]])) - 1
# write clipboard to local file
tf <- tempfile(fileext = '.bib')
cat(clipr::read_clip(),file = tf,sep = '\n')
# vallidate bib
pp <- paperpile::parse_bib(path = tf)
if(length(pp)!=expectation)
message('number of citations mismatch')
rD$client$closeall()
rD$server$stop()
i see why it is working... chrome is copying into my pwd a dir called ~
with my chrome profile.... that is a weird action
Just a note about extensions: it can't be used in headless mode ! See https://github.com/puppeteer/puppeteer/blob/master/docs/api.md#working-with-chrome-extensions
This is from a question by @yonicd about crrri being able to load a session with user credential and extension.
From a little search, I think this is possible in non-headless mode, and maybe in headless mode for credentials (but not sure).
What I tried is just using the User Profile Directory of my chrome browser. With chrome you can to that with
--user-data-dir
. In crrri, I think for security reason, we just create a new work dir per session, and remove when closing.We could maybe offer an option for user to opt-in in a persisent user Profile. The use case I see:
What we must take care of:
I don't know if it will be enough for your usage @yonicd but it seems normal that crrri allow that.
Test using internal non exported functions
```r library(crrri) # Launch chrome with my user profile # When browser opens I can see my extensions chrome <- crrri:::chr_launch( bin = "C:/Program Files (x86)/Google/Chrome/Application/chrome.exe", debug_port = 9222L, extra_args = NULL, headless = FALSE, work_dir = "C:/Users/chris/AppData/Local/Google/Chrome/User Data" ) session <- CDPRemote$new( host = "localhost", debug_port = 9222L, secure = FALSE, local = FALSE, retry_delay = 0.2, max_attempts = 15L ) client <- session$connect(callback = ~ .x$inspect()) Page <- client$Page # I connected to the community before in this profile # so i should be already connected in inspector mode too. Page$navigate(url = "http://community.rstudio.com/") # You can check that credentials are available in # chrome://settings/passwords # in the open browser ```