Login details - Githubissues

E-11-V commented 2 years ago

Hi, thank you so much for this. I am a complete ignorant on code matters but I am trying my best. In this case I have managed to run the code following the suggestions from the closed issue re: driver, however, my problem now is that when I download the pages I get only the login page where I'm suppose to put the credentials.

I have tried using the url with the token bit as the source ("https://reader4-cyberlibris-com.ezproxy.***-univ.fr/api/js/?token=*****") but it didn't help either.

What can I do?

Thank you so much again for your help! (Tagging @luroy in case she knows too because my access to scholarvox will expire soon)

luroy commented 2 years ago

Hi, Sorry for my late reply. I advise you to go on Scholar Vox and to identify yourself (if necessary... often, the authentication is done automatically) from the Université de Normandie gateway. Then, once you have access to the e-book, copy the URL from your browser, and paste it into the code. The URL in the code should start with something like this: https://normandie-univ.scholarvox.com/.... Hope it helps! Léa

E-11-V commented 2 years ago

I had already done that. It was my impression I had to add some arguments in main.py containing the login details for it to work. Maybe it's related to the Chrome Driver?

gaspachoo commented 2 years ago

I added some lines:

from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
cookies = ["dictionnary for each cookie"]
driver.get(URL+'1')

for cookie in cookies:
    driver.add_cookie(cookie)
    print('cookie ajouté')

WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="userInputArea"]/div[1]/input')))
driver.find_element(By.XPATH,'//*[@id="userInputArea"]/div[1]/input').click()

E-11-V commented 2 years ago

Hi @gaspachoo thanks a lot. Forgive my ignorance but what exactly should I do with that? Sorry and thanks again!

PD: I pasted it into the code of main.py but got the following error:

" File "C:\Users\Utente\Desktop\Pdfer\TheGreatPDFer-master\main.py", line 15, in driver.get(URL+'1') NameError: name 'driver' is not defined"

Instead, running it without those lines gives me the error mentioned above with this message:

"Drivers initialization C:\Users\Utente\Desktop\Pdfer\TheGreatPDFer-master\main.py:20: DeprecationWarning: executable_path has been deprecated, please pass in a Service object driver = webdriver.Chrome(executable_path="./Driver/chromedriver.exe",

DevTools listening on ws://127.0.0.1:63538/devtools/browser/c6a77f5a-2280-4f8b-9d78-e09544b9f3ea"

gaspachoo commented 2 years ago

It's because you shoud have paste it after driver and URL have been defined for exemple hère: IMG_20220917_073457 and download cookies with Chrome EditThisCookie extension, remove the SameSite entries and those who are not from the main site. Tell me if you want a better detailled process

E-11-V commented 2 years ago

Dear @gaspachoo , first of all thank you so much for your time. I have installed the extension but I could not understant what downloading the cookies would be here. The extension allows me only to "export" them into my clipboard.

If you could shed light on that particular bit (downloading + removing SimpleSite entries) I would be ever so grateful!!

gaspachoo commented 2 years ago

Ok then click on export cookies, then paste them on word or notepad. To use them with python you have to prépare them.

1/ Remove the last cookie from the list, starting with domain: univ-scholarvox.... 2/ For each cookie, remove "Samesite:".... 3/ Replace (CTRL+F or CTRL+H on Word) each "true" word by "True" and each "false" by "False" 4/ Press CTRL+H on Word and replace "^p" by " "

Now your cookies are ready so copy my code add-ons and paste them where i said, replacing the cookies = ["dictionnary for each cookie"]by your cookies list. Connect once on the Book, replacing the URL and try :))

E-11-V commented 2 years ago

Thanks for the quick response! I'm afraid I've got a new issue Now when I run main.py I get a series of different, promissing messages; however, there is no output in the form of pdf or images. Previously the output I got was a pdf, with 9 pages containing the login page 9 times. Now it seems like I must have touched something else because there is no output and it only does 1 page before bringing PowerShell goes back to the usual mode....

I'm so sorry to be taking your time and thank you for showing me the way!

gaspachoo commented 2 years ago

Hard do say... Send me your full program or download my program and paste your URL and cookies again ... And i recommend you to edit requirements.txt, removing ==+version_number and dopip install -r "requirements.txt" again to get the last version of everything ( you could download the last version of chromedriver.exe to on Google) In the zip file, you can find 'requirements.txt' and the last chromedriver.exe in his Driver folder. TheGreatPDFer.zip

My program:

import time
from io import BytesIO

from PIL import Image
from fpdf import FPDF
from selenium import webdriver

from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

x = 1300
y = 1200
off = 160
try:
    print("Drivers initialization")
    chrome_options = webdriver.ChromeOptions()
    chrome_options.add_argument('--log-level=3')
    chrome_options.add_argument('--disable-logging')
    chrome_options.add_argument("--headless")
    chrome_options.add_argument(F"--window-size={x},{y}")
    driver = webdriver.Chrome(executable_path="./Driver/chromedriver.exe",
                              options=chrome_options)
except:
    exit("Driver Error")
print("Done !")
URL = 'https:// .......    /page/'
pdf = FPDF(unit="pt", format=(x - 2 * off + 20, y + 50))
pdf.set_auto_page_break(0)

cookies = [] 

driver.get(URL+'1')

for cookie in cookies:
    driver.add_cookie(cookie)
    print('cookie ajouté')

WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="userInputArea"]/div[1]/input')))
driver.find_element(By.XPATH,'//*[@id="userInputArea"]/div[1]/input').click()

try:
    for i in range(250,351):
        name = F"page{i}"
        print("Starting " + name)
        driver.get(URL + str(i))

        time.sleep(1.7)
        png = driver.get_screenshot_as_png()
        im = Image.open(BytesIO(png))
        output_img = im.crop((off, 0, x - off - 40, y))
        output_img.save(name + ".png")
        pdf.add_page()
        pdf.image(name + ".png")
        print(name.split(".")[0], "done !")
    pdf.output("yourfile.pdf", "F")
    driver.quit()
except Exception as ignored:
    driver.close()
    driver.quit()

E-11-V commented 2 years ago

Hi again! I did fix a few things (updated chromedriver and fixed some selenium issue) but I still get the same result. It seems like the required packages are present

Here is my program `# -- coding: cp1252 -- import time from io import BytesIO

from PIL import Image from fpdf import FPDF from selenium import webdriver

from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.support.ui import WebDriverWait

x = 1300 y = 1200 off = 160 try: print("Drivers initialization") chrome_options = webdriver.ChromeOptions() chrome_options.add_argument('--log-level=3') chrome_options.add_argument('--disable-logging') chrome_options.add_argument("--headless") chrome_options.add_argument(F"--window-size={x},{y}") driver = webdriver.Chrome(executable_path="./Driver/chromedriver.exe", options=chrome_options) except: exit("Driver Error") print("Done !") URL = 'https://univ-scholarvox-com.ezproxy.normandie-univ.fr/reader/docid/45007668/page' pdf = FPDF(unit="pt", format=(x - 2 * off + 20, y + 50)) pdf.set_auto_page_break(0)

cookies = [ { "domain": ".normandie-univ.fr", "expirationDate": 1672064084.780694, "hostOnly": False, "httpOnly": True, "name": "_discongsaml_idp", "path": "/", "secure": True, "session": False, "storeId": "0", "value": "aHR0cHM6Ly9pZHAzLnVuaWNhZW4uZnIvaWRwL3NoaWJib2xldGg%3D", "id": 1 }, { "domain": ".normandie-univ.fr", "expirationDate": 1664288084.780812, "hostOnly": False, "httpOnly": True, "name": "_discongsaml_sp", "path": "/", "secure": True, "session": False, "storeId": "0", "value": "aHR0cHM6Ly9lenByb3h5Lm5vcm1hbmRpZS11bml2LmZy", "id": 2 }, { "domain": ".normandie-univ.fr", "expirationDate": 1695769735.719742, "hostOnly": False, "httpOnly": False, "name": "amplitude_id_9f6c0bb8b82021496164c672a7dc98d6_edmnormandie-univ.fr", "path": "/", "secure": False, "session": False, "storeId": "0", "value": "eyJkZXZpY2VJZCI6IjVlMjdkZjhhLWYxYzktNDhjNy04NWVlLTkxNzY3YzhjYmQyNVIiLCJ1c2VySWQiOm51bGwsIm9wdE91dCI6ZmFsc2UsInNlc3Npb25JZCI6MTY2MTIwOTcyNjU4NSwibGFzdEV2ZW50VGltZSI6MTY2MTIwOTczNTcxNiwiZXZlbnRJZCI6MCwiaWRlbnRpZnlJZCI6Miwic2VxdWVuY2VOdW1iZXIiOjJ9", "id": 3 }, { "domain": ".normandie-univ.fr", "hostOnly": False, "httpOnly": False, "name": "ezproxy", "path": "/", "secure": False, "session": True, "storeId": "0", "value": "aVgRtnZZmOtAlFS", "id": 4 } ]

driver.get(URL+'1')

for cookie in cookies: driver.add_cookie(cookie) print('cookie ajouté')

WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//[@id="userInputArea"]/div[1]/input'))) driver.find_element(By.XPATH,'//[@id="userInputArea"]/div[1]/input').click()

try: for i in range(1, 250): name = F"page{i}" print("Starting " + name) driver.get(URL + str(i))

    time.sleep(1.7)
    png = driver.get_screenshot_as_png()
    im = Image.open(BytesIO(png))
    output_img = im.crop((off, 0, x - off - 40, y))
    output_img.save(name + ".png")
    pdf.add_page()
    pdf.image(name + ".png")
    print(name.split(".")[0], "done !")
pdf.output("yourfile.pdf", "F")
driver.quit()

except Exception as ignored: driver.close() driver.quit() `

And here is what I get when I run it.

gaspachoo commented 2 years ago

Hi again! I did fix a few things (updated chromedriver and fixed some selenium issue) but I still get the same result. It seems like the required packages are present

Here is my program `# -- coding: cp1252 -- import time from io import BytesIO

from PIL import Image from fpdf import FPDF from selenium import webdriver

from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.support.ui import WebDriverWait

x = 1300 y = 1200 off = 160 try: print("Drivers initialization") chrome_options = webdriver.ChromeOptions() chrome_options.add_argument('--log-level=3') chrome_options.add_argument('--disable-logging') chrome_options.add_argument("--headless") chrome_options.add_argument(F"--window-size={x},{y}") driver = webdriver.Chrome(executable_path="./Driver/chromedriver.exe", options=chrome_options) except: exit("Driver Error") print("Done !") URL = 'https://univ-scholarvox-com.ezproxy.normandie-univ.fr/reader/docid/45007668/page' pdf = FPDF(unit="pt", format=(x - 2 * off + 20, y + 50)) pdf.set_auto_page_break(0)

cookies = [ { "domain": ".normandie-univ.fr", "expirationDate": 1672064084.780694, "hostOnly": False, "httpOnly": True, "name": "_discongsaml_idp", "path": "/", "secure": True, "session": False, "storeId": "0", "value": "aHR0cHM6Ly9pZHAzLnVuaWNhZW4uZnIvaWRwL3NoaWJib2xldGg%3D", "id": 1 }, { "domain": ".normandie-univ.fr", "expirationDate": 1664288084.780812, "hostOnly": False, "httpOnly": True, "name": "_discongsaml_sp", "path": "/", "secure": True, "session": False, "storeId": "0", "value": "aHR0cHM6Ly9lenByb3h5Lm5vcm1hbmRpZS11bml2LmZy", "id": 2 }, { "domain": ".normandie-univ.fr", "expirationDate": 1695769735.719742, "hostOnly": False, "httpOnly": False, "name": "amplitude_id_9f6c0bb8b82021496164c672a7dc98d6_edmnormandie-univ.fr", "path": "/", "secure": False, "session": False, "storeId": "0", "value": "eyJkZXZpY2VJZCI6IjVlMjdkZjhhLWYxYzktNDhjNy04NWVlLTkxNzY3YzhjYmQyNVIiLCJ1c2VySWQiOm51bGwsIm9wdE91dCI6ZmFsc2UsInNlc3Npb25JZCI6MTY2MTIwOTcyNjU4NSwibGFzdEV2ZW50VGltZSI6MTY2MTIwOTczNTcxNiwiZXZlbnRJZCI6MCwiaWRlbnRpZnlJZCI6Miwic2VxdWVuY2VOdW1iZXIiOjJ9", "id": 3 }, { "domain": ".normandie-univ.fr", "hostOnly": False, "httpOnly": False, "name": "ezproxy", "path": "/", "secure": False, "session": True, "storeId": "0", "value": "aVgRtnZZmOtAlFS", "id": 4 } ]

driver.get(URL+'1')

for cookie in cookies: driver.add_cookie(cookie) print('cookie ajouté')

WebDriverWait(driver, 10).until(EC.element_to_beclickable((By.XPATH, '//[@id="userInputArea"]/div[1]/input'))) driver.findelement(By.XPATH,'//[@id="userInputArea"]/div[1]/input').click()

try: for i in range(1, 250): name = F"page{i}" print("Starting " + name) driver.get(URL + str(i))
    time.sleep(1.7)
    png = driver.get_screenshot_as_png()
    im = Image.open(BytesIO(png))
    output_img = im.crop((off, 0, x - off - 40, y))
    output_img.save(name + ".png")
    pdf.add_page()
    pdf.image(name + ".png")
    print(name.split(".")[0], "done !")
pdf.output("yourfile.pdf", "F")
driver.quit()
except Exception as ignored: driver.close() driver.quit() `

And here is what I get when I run it.

Maybe you forget a slash "/" at the end of the URL ?

E-11-V commented 2 years ago

Nothing changes, with or without the slash, I'm afraid.

gaspachoo commented 2 years ago

Nothing changes, with or without the slash, I'm afraid.

Can you please add a : png = driver.get_screenshot_as_png().save("test.png") beforetry

E-11-V commented 2 years ago

This is what I get now

gaspachoo commented 2 years ago

sorry, i haven't use webdriver for a while, i meant : driver.save_screenshot("test.png")

E-11-V commented 2 years ago

Now I get this. No sorry! You are doing already so much! :)

gaspachoo commented 2 years ago

Looks weird, ive verified, maybe try driver.save_screenshot('./image.png')

E-11-V commented 2 years ago

Yeah, I'm particularly curious about the bit that reads "selenium.common.exceptions.UnexpectedAlertPresentException: Alert Text: You must select an organisation. Message: unexpected alert open: {Alert text : You must select an organisation.} (Session info: headless chrome=105.0.5195.127)" as that's the first mistake I had in the first place (i.e. the program not being able to connect to Scholarvox properly using my login details)

E-11-V commented 2 years ago

Could it have to do with the type of connection? I'm accessing the book remotely: would it be different if done from the campus? What was your case, @luroy ?

gaspachoo commented 2 years ago

I d

Could it have to do with the type of connection? I'm accessing the book remotely: would it be different if done from the campus? What was your case, @luroy ?

I think that the cookies have been regenerated after a while, so you need to go on the book manually again, reimport cookies and try again. "You must select an org" is the message you see when going on the website after a while.

E-11-V commented 2 years ago

I went to the book, repeated the cookies process as you showed me before and got this result

So sorry to bother you with this, but I admit that at this point I feel like we are really close to the issue!

gaspachoo commented 2 years ago

Maybe could you add some print() each 10 lines of code to better know where the issue is exactly

interpretationphenomenon commented 2 years ago

Have you guys found a solution? Let me know ! Thanks

holypaladincinderella commented 1 year ago

hi fellows,can anyone help me? i am a rookie and got no clue how to use the programe

aymannc / TheGreatPDFer

Login details #10