Open E-11-V opened 2 years ago
Hi, Sorry for my late reply. I advise you to go on Scholar Vox and to identify yourself (if necessary... often, the authentication is done automatically) from the Université de Normandie gateway. Then, once you have access to the e-book, copy the URL from your browser, and paste it into the code. The URL in the code should start with something like this: https://normandie-univ.scholarvox.com/.... Hope it helps! Léa
I had already done that. It was my impression I had to add some arguments in main.py containing the login details for it to work. Maybe it's related to the Chrome Driver?
I added some lines:
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
cookies = ["dictionnary for each cookie"]
driver.get(URL+'1')
for cookie in cookies:
driver.add_cookie(cookie)
print('cookie ajouté')
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="userInputArea"]/div[1]/input')))
driver.find_element(By.XPATH,'//*[@id="userInputArea"]/div[1]/input').click()
Hi @gaspachoo thanks a lot. Forgive my ignorance but what exactly should I do with that? Sorry and thanks again!
PD: I pasted it into the code of main.py but got the following error:
" File "C:\Users\Utente\Desktop\Pdfer\TheGreatPDFer-master\main.py", line 15, in
Instead, running it without those lines gives me the error mentioned above with this message:
"Drivers initialization C:\Users\Utente\Desktop\Pdfer\TheGreatPDFer-master\main.py:20: DeprecationWarning: executable_path has been deprecated, please pass in a Service object driver = webdriver.Chrome(executable_path="./Driver/chromedriver.exe",
DevTools listening on ws://127.0.0.1:63538/devtools/browser/c6a77f5a-2280-4f8b-9d78-e09544b9f3ea"
It's because you shoud have paste it after driver and URL have been defined for exemple hère: and download cookies with Chrome EditThisCookie extension, remove the SameSite entries and those who are not from the main site. Tell me if you want a better detailled process
Dear @gaspachoo , first of all thank you so much for your time. I have installed the extension but I could not understant what downloading the cookies would be here. The extension allows me only to "export" them into my clipboard.
If you could shed light on that particular bit (downloading + removing SimpleSite entries) I would be ever so grateful!!
Ok then click on export cookies, then paste them on word or notepad. To use them with python you have to prépare them.
1/ Remove the last cookie from the list, starting with domain: univ-scholarvox.... 2/ For each cookie, remove "Samesite:".... 3/ Replace (CTRL+F or CTRL+H on Word) each "true" word by "True" and each "false" by "False" 4/ Press CTRL+H on Word and replace "^p" by " "
Now your cookies are ready so copy my code add-ons and paste them where i said, replacing the cookies = ["dictionnary for each cookie"]
by your cookies list.
Connect once on the Book, replacing the URL and try :))
Thanks for the quick response! I'm afraid I've got a new issue Now when I run main.py I get a series of different, promissing messages; however, there is no output in the form of pdf or images. Previously the output I got was a pdf, with 9 pages containing the login page 9 times. Now it seems like I must have touched something else because there is no output and it only does 1 page before bringing PowerShell goes back to the usual mode....
I'm so sorry to be taking your time and thank you for showing me the way!
Hard do say... Send me your full program or download my program and paste your URL and cookies again ...
And i recommend you to edit requirements.txt, removing ==+version_number
and dopip install -r "requirements.txt"
again to get the last version of everything ( you could download the last version of chromedriver.exe to on Google)
In the zip file, you can find 'requirements.txt' and the last chromedriver.exe in his Driver folder.
TheGreatPDFer.zip
My program:
import time
from io import BytesIO
from PIL import Image
from fpdf import FPDF
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
x = 1300
y = 1200
off = 160
try:
print("Drivers initialization")
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--log-level=3')
chrome_options.add_argument('--disable-logging')
chrome_options.add_argument("--headless")
chrome_options.add_argument(F"--window-size={x},{y}")
driver = webdriver.Chrome(executable_path="./Driver/chromedriver.exe",
options=chrome_options)
except:
exit("Driver Error")
print("Done !")
URL = 'https:// ....... /page/'
pdf = FPDF(unit="pt", format=(x - 2 * off + 20, y + 50))
pdf.set_auto_page_break(0)
cookies = []
driver.get(URL+'1')
for cookie in cookies:
driver.add_cookie(cookie)
print('cookie ajouté')
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="userInputArea"]/div[1]/input')))
driver.find_element(By.XPATH,'//*[@id="userInputArea"]/div[1]/input').click()
try:
for i in range(250,351):
name = F"page{i}"
print("Starting " + name)
driver.get(URL + str(i))
time.sleep(1.7)
png = driver.get_screenshot_as_png()
im = Image.open(BytesIO(png))
output_img = im.crop((off, 0, x - off - 40, y))
output_img.save(name + ".png")
pdf.add_page()
pdf.image(name + ".png")
print(name.split(".")[0], "done !")
pdf.output("yourfile.pdf", "F")
driver.quit()
except Exception as ignored:
driver.close()
driver.quit()
Hi again! I did fix a few things (updated chromedriver and fixed some selenium issue) but I still get the same result. It seems like the required packages are present
Here is my program `# -- coding: cp1252 -- import time from io import BytesIO
from PIL import Image from fpdf import FPDF from selenium import webdriver
from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.support.ui import WebDriverWait
x = 1300 y = 1200 off = 160 try: print("Drivers initialization") chrome_options = webdriver.ChromeOptions() chrome_options.add_argument('--log-level=3') chrome_options.add_argument('--disable-logging') chrome_options.add_argument("--headless") chrome_options.add_argument(F"--window-size={x},{y}") driver = webdriver.Chrome(executable_path="./Driver/chromedriver.exe", options=chrome_options) except: exit("Driver Error") print("Done !") URL = 'https://univ-scholarvox-com.ezproxy.normandie-univ.fr/reader/docid/45007668/page' pdf = FPDF(unit="pt", format=(x - 2 * off + 20, y + 50)) pdf.set_auto_page_break(0)
cookies = [ { "domain": ".normandie-univ.fr", "expirationDate": 1672064084.780694, "hostOnly": False, "httpOnly": True, "name": "_discongsaml_idp", "path": "/", "secure": True, "session": False, "storeId": "0", "value": "aHR0cHM6Ly9pZHAzLnVuaWNhZW4uZnIvaWRwL3NoaWJib2xldGg%3D", "id": 1 }, { "domain": ".normandie-univ.fr", "expirationDate": 1664288084.780812, "hostOnly": False, "httpOnly": True, "name": "_discongsaml_sp", "path": "/", "secure": True, "session": False, "storeId": "0", "value": "aHR0cHM6Ly9lenByb3h5Lm5vcm1hbmRpZS11bml2LmZy", "id": 2 }, { "domain": ".normandie-univ.fr", "expirationDate": 1695769735.719742, "hostOnly": False, "httpOnly": False, "name": "amplitude_id_9f6c0bb8b82021496164c672a7dc98d6_edmnormandie-univ.fr", "path": "/", "secure": False, "session": False, "storeId": "0", "value": "eyJkZXZpY2VJZCI6IjVlMjdkZjhhLWYxYzktNDhjNy04NWVlLTkxNzY3YzhjYmQyNVIiLCJ1c2VySWQiOm51bGwsIm9wdE91dCI6ZmFsc2UsInNlc3Npb25JZCI6MTY2MTIwOTcyNjU4NSwibGFzdEV2ZW50VGltZSI6MTY2MTIwOTczNTcxNiwiZXZlbnRJZCI6MCwiaWRlbnRpZnlJZCI6Miwic2VxdWVuY2VOdW1iZXIiOjJ9", "id": 3 }, { "domain": ".normandie-univ.fr", "hostOnly": False, "httpOnly": False, "name": "ezproxy", "path": "/", "secure": False, "session": True, "storeId": "0", "value": "aVgRtnZZmOtAlFS", "id": 4 } ]
driver.get(URL+'1')
for cookie in cookies: driver.add_cookie(cookie) print('cookie ajouté')
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//[@id="userInputArea"]/div[1]/input'))) driver.find_element(By.XPATH,'//[@id="userInputArea"]/div[1]/input').click()
try: for i in range(1, 250): name = F"page{i}" print("Starting " + name) driver.get(URL + str(i))
time.sleep(1.7)
png = driver.get_screenshot_as_png()
im = Image.open(BytesIO(png))
output_img = im.crop((off, 0, x - off - 40, y))
output_img.save(name + ".png")
pdf.add_page()
pdf.image(name + ".png")
print(name.split(".")[0], "done !")
pdf.output("yourfile.pdf", "F")
driver.quit()
except Exception as ignored: driver.close() driver.quit() `
And here is what I get when I run it.
Hi again! I did fix a few things (updated chromedriver and fixed some selenium issue) but I still get the same result. It seems like the required packages are present
Here is my program `# -- coding: cp1252 -- import time from io import BytesIO
from PIL import Image from fpdf import FPDF from selenium import webdriver
from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.support.ui import WebDriverWait
x = 1300 y = 1200 off = 160 try: print("Drivers initialization") chrome_options = webdriver.ChromeOptions() chrome_options.add_argument('--log-level=3') chrome_options.add_argument('--disable-logging') chrome_options.add_argument("--headless") chrome_options.add_argument(F"--window-size={x},{y}") driver = webdriver.Chrome(executable_path="./Driver/chromedriver.exe", options=chrome_options) except: exit("Driver Error") print("Done !") URL = 'https://univ-scholarvox-com.ezproxy.normandie-univ.fr/reader/docid/45007668/page' pdf = FPDF(unit="pt", format=(x - 2 * off + 20, y + 50)) pdf.set_auto_page_break(0)
cookies = [ { "domain": ".normandie-univ.fr", "expirationDate": 1672064084.780694, "hostOnly": False, "httpOnly": True, "name": "_discongsaml_idp", "path": "/", "secure": True, "session": False, "storeId": "0", "value": "aHR0cHM6Ly9pZHAzLnVuaWNhZW4uZnIvaWRwL3NoaWJib2xldGg%3D", "id": 1 }, { "domain": ".normandie-univ.fr", "expirationDate": 1664288084.780812, "hostOnly": False, "httpOnly": True, "name": "_discongsaml_sp", "path": "/", "secure": True, "session": False, "storeId": "0", "value": "aHR0cHM6Ly9lenByb3h5Lm5vcm1hbmRpZS11bml2LmZy", "id": 2 }, { "domain": ".normandie-univ.fr", "expirationDate": 1695769735.719742, "hostOnly": False, "httpOnly": False, "name": "amplitude_id_9f6c0bb8b82021496164c672a7dc98d6_edmnormandie-univ.fr", "path": "/", "secure": False, "session": False, "storeId": "0", "value": "eyJkZXZpY2VJZCI6IjVlMjdkZjhhLWYxYzktNDhjNy04NWVlLTkxNzY3YzhjYmQyNVIiLCJ1c2VySWQiOm51bGwsIm9wdE91dCI6ZmFsc2UsInNlc3Npb25JZCI6MTY2MTIwOTcyNjU4NSwibGFzdEV2ZW50VGltZSI6MTY2MTIwOTczNTcxNiwiZXZlbnRJZCI6MCwiaWRlbnRpZnlJZCI6Miwic2VxdWVuY2VOdW1iZXIiOjJ9", "id": 3 }, { "domain": ".normandie-univ.fr", "hostOnly": False, "httpOnly": False, "name": "ezproxy", "path": "/", "secure": False, "session": True, "storeId": "0", "value": "aVgRtnZZmOtAlFS", "id": 4 } ]
driver.get(URL+'1')
for cookie in cookies: driver.add_cookie(cookie) print('cookie ajouté')
WebDriverWait(driver, 10).until(EC.element_to_beclickable((By.XPATH, '//[@id="userInputArea"]/div[1]/input'))) driver.findelement(By.XPATH,'//[@id="userInputArea"]/div[1]/input').click()
try: for i in range(1, 250): name = F"page{i}" print("Starting " + name) driver.get(URL + str(i))
time.sleep(1.7) png = driver.get_screenshot_as_png() im = Image.open(BytesIO(png)) output_img = im.crop((off, 0, x - off - 40, y)) output_img.save(name + ".png") pdf.add_page() pdf.image(name + ".png") print(name.split(".")[0], "done !") pdf.output("yourfile.pdf", "F") driver.quit()
except Exception as ignored: driver.close() driver.quit() `
And here is what I get when I run it.
Maybe you forget a slash "/" at the end of the URL ?
Nothing changes, with or without the slash, I'm afraid.
Nothing changes, with or without the slash, I'm afraid.
Can you please add a :
png = driver.get_screenshot_as_png().save("test.png")
beforetry
This is what I get now
sorry, i haven't use webdriver for a while, i meant :
driver.save_screenshot("test.png")
Now I get this. No sorry! You are doing already so much! :)
Looks weird, ive verified, maybe try driver.save_screenshot('./image.png')
Yeah, I'm particularly curious about the bit that reads "selenium.common.exceptions.UnexpectedAlertPresentException: Alert Text: You must select an organisation. Message: unexpected alert open: {Alert text : You must select an organisation.} (Session info: headless chrome=105.0.5195.127)" as that's the first mistake I had in the first place (i.e. the program not being able to connect to Scholarvox properly using my login details)
Could it have to do with the type of connection? I'm accessing the book remotely: would it be different if done from the campus? What was your case, @luroy ?
I d
Could it have to do with the type of connection? I'm accessing the book remotely: would it be different if done from the campus? What was your case, @luroy ?
I think that the cookies have been regenerated after a while, so you need to go on the book manually again, reimport cookies and try again. "You must select an org" is the message you see when going on the website after a while.
I went to the book, repeated the cookies process as you showed me before and got this result
So sorry to bother you with this, but I admit that at this point I feel like we are really close to the issue!
Maybe could you add some print()
each 10 lines of code to better know where the issue is exactly
Have you guys found a solution? Let me know ! Thanks
hi fellows,can anyone help me? i am a rookie and got no clue how to use the programe
Hi, thank you so much for this. I am a complete ignorant on code matters but I am trying my best. In this case I have managed to run the code following the suggestions from the closed issue re: driver, however, my problem now is that when I download the pages I get only the login page where I'm suppose to put the credentials.
I have tried using the url with the token bit as the source ("https://reader4-cyberlibris-com.ezproxy.***-univ.fr/api/js/?token=*****") but it didn't help either.
What can I do?
Thank you so much again for your help! (Tagging @luroy in case she knows too because my access to scholarvox will expire soon)