dandi / dandi-api-webshots

Webshots and timing for all dandisets in the main instance of the archive
0 stars 1 forks source link

Finalize meditor ("Edit metadata") webshotting #1

Closed yarikoptic closed 3 years ago

yarikoptic commented 3 years ago

there is already a stab https://github.com/dandi/dandi-api-webshots/blob/master/tools/make_webshots.py#L49 but we would need to have authenticated session (based on having DANDI_API_KEY env var provided) to be able to enter meditor for a dataset for a user which either has admin privileges or just owns a specific dandiset.

jwodder commented 3 years ago

@yarikoptic How does one authenticate on https://gui-beta-dandiarchive-org.netlify.app/ using just an API key?

yarikoptic commented 3 years ago

Would be a question which may be @dchiquito and @dandi/dandiarchive could answer better, but I guess it could be done establishing a session for selenium, and clicking on "Login" button, and authenticating (I guess via github?) so may be it is not DANDI_API_KEY which is needed but github credentials (for CI we do it using dandibot account) .

jwodder commented 3 years ago

Current status: I can get Selenium to click "Login," fill out the form, and follow the redirect, but I'm not logged in.

@dchiquito How does GitHub communicate authentication to https://gui-beta-dandiarchive-org.netlify.app/? It doesn't seem to set any cookies.

dchiquito commented 3 years ago

There are three domains in play: gui-beta-dandiarchive-org.netlify.app, app.dandiarchive.org, and github.com. Clicking Login in the web GUI redirects you to app.dandiarchive.org, which then redirects you to GitHub. After you log in in GitHub, it redirects back to the API, which then redirects back to the web GUI. I believe HTTP headers are used to shuffle the authentication token around. The token is stored in localStorage, which should be adjacent to cookies in the dev inspector.

When you click Log Out on the web GUI, you are actually logging out the API server, not GitHub. Clicking Login after logging out redirects you to the API server, which detects that you are already logged in to GitHub, and immediately redirects you back again. I recommend using an incognito window for determining how the system works for clean logins, there may be some mismatch.

jwodder commented 3 years ago

@dchiquito I can confirm that, in a normal Chrome session, local storage is created after logging in. However, nothing seems to be stored in local storage in a Selenium session, based on the empty output from:

print(driver.execute_script(
    "var ls = window.localStorage, keys = []; "
    "for (var i = 0; i < ls.length; ++i) "
    "  keys[i] = ls.key(i); "
    "return keys;"
))

Note that, for comparison, this returns a non-empty list when using Selenium to view jsbin.com.

EDIT: If I visit different domains before querying local storage, I see that api.dandiarchive.org has no local storage while github.com has a 'bundle-urls' key.

yarikoptic commented 3 years ago

So is it selenium peculiarity somehow, or how could we check server side interactions related to this - may be culprit is there. @jwodder do you have access to the logs?

jwodder commented 3 years ago

@yarikoptic I have access to Heroku and the Papertrail logs. Are those what you're referring to?

yarikoptic commented 3 years ago

yes -- I would have compared log'ed activities between "real" (in a new incognito window) and selenium driven cases to possibly identify any difference

jwodder commented 3 years ago

I couldn't find anything helpful in the logs, but I did discover that if I add a sleep() call for a few seconds after logging in, the login ends up being successful.

@dchiquito Is the login process perhaps finalized in the background?

dchiquito commented 3 years ago

What page are you waiting on? I know there's some stuff that runs when gui.dandiarchive.org loads to grab the login token from API storage (if it exists) and attempt to get the current user.

For reactive apps, you frequently need to wait for pages to load when doing things in Selenium, since the page state is modified asynchronously.

jwodder commented 3 years ago

@dchiquito I'm waiting on the main https://gui-beta-dandiarchive-org.netlify.app/ page, or wherever you get redirected to after authenticating with GitHub. Are there any changes to the DOM upon login completion that I can tell Selenium to wait for instead of having to sleep?

dchiquito commented 3 years ago

There's a little v-avatar that shows up with the current user's initials. It's probably enough to wait for the presence of .v-avatar.

yarikoptic commented 3 years ago

Marking as blocked since I believe we would need some non-github oauth way to login if I got it right -- our attempts to overcome github safety mechanisms has failed so far

yarikoptic commented 3 years ago

Removed blocked, we decided to proceed with having this to run on drogon instead of CI -- then that singular browser instance / device id could be properly acknowledged. So please proceed with #11 or some additions from #12 (e.g. tune up of window size etc https://github.com/dandi/dandi-api-webshots/pull/12/files#diff-f8109c43576a9af22cec8b69ea13633420e5aa2b80762e8596ec367e8f0eef2dR130) so it works and then deploy on drogon as a cron job. Should use dandibot user on github.

jwodder commented 3 years ago

@yarikoptic

yarikoptic commented 3 years ago
jwodder commented 3 years ago

@yarikoptic It appears that drogon still needs to be verified for GitHub login somehow. How do I do that?

satra commented 3 years ago

@jwodder - either open a browser on drogon in X11 mode, or use a section of the code interactively to submit the verification code you receive from github via email through dandibot@mit.edu (i believe it goes to your gmail account)

jwodder commented 3 years ago

@yarikoptic I have merged #11, disabled the GitHub Actions workflow, and set up a cronjob to run every 6 hours.

yarikoptic commented 3 years ago

Great!

jwodder commented 3 years ago

@yarikoptic PRs to capture "edit metadata" and remove the workflow: #14 and #15

yarikoptic commented 3 years ago
flock -n -E 0 /home/dandi/.run/run-webshots.lock ~/cronlib/dandi-api-webshots/tools/run-webshots.sh

but it crashed with

+ xvfb-run python tools/make_webshots.py
Traceback (most recent call last):
  File "tools/make_webshots.py", line 129, in <module>
    readme += process_dandiset(driver, ds)
  File "tools/make_webshots.py", line 88, in process_dandiset
    act()
  File "tools/make_webshots.py", line 65, in click_edit
    submit_button.click()
  File "/home/dandi/cronlib/dandi-api-webshots/venv/lib/python3.8/site-packages/selenium/webdriver/remote/webelement.py", line 80, in click
    self._execute(Command.CLICK_ELEMENT)
  File "/home/dandi/cronlib/dandi-api-webshots/venv/lib/python3.8/site-packages/selenium/webdriver/remote/webelement.py", line 633, in _execute
    return self._parent.execute(command, params)
  File "/home/dandi/cronlib/dandi-api-webshots/venv/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/home/dandi/cronlib/dandi-api-webshots/venv/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
from unknown error: cannot determine loading status
from tab crashed
  (Session info: chrome=73.0.3683.75)
  (Driver info: chromedriver=73.0.3683.75,platform=Linux 4.19.0-0.bpo.9-amd64 x86_64)

so I guess we should guard for exceptions like this, may be retry few times?

jwodder commented 3 years ago

@yarikoptic

yarikoptic commented 3 years ago
satra commented 3 years ago

which user is being used? "Edit metadata" is only available for the owner of the dandiset.

jwodder commented 3 years ago

@yarikoptic It seems that the page content fails to load only if (but not always when) the window size option is given. This happens on the first Dandiset, so I don't know whether it's specific to that Dandiset or not.

jwodder commented 3 years ago

@satra We're using dandibot.

satra commented 3 years ago

i don't think dandibot is an owner of any dandiset. so it won't be able to "click on" the "edit metadata". that button would be disabled for that user. you could test if it can by adding dandibot as owner to one of the test dandisets.

satra commented 3 years ago

it's possible that because you are using xvfb that there is a conflict with window size. try running the script in headless mode without xvfb but providing window size through options.

btw, this may be a function of chromedriver which is now at 91+.

jwodder commented 3 years ago

try running the script in headless mode without xvfb but providing window size through options.

That seems to have worked, but now the script keeps dying on Dandiset 000020 with "invalid session id". When I tested the script locally earlier, I also noticed that the script would just pause indefinitely at 000020 until I killed it.

satra commented 3 years ago

i don't get a crash (with my version of the script) locally, but i had to insert a sleep between getting the url and taking the screenshot. but i'm only testing one dandiset. could you start with 20? i wonder if it's some function of loading. you could check if your script runs fine in the container or not.

if it helps my chrome/chromedriver in the container (joyzoursky/python-chromedriver:3.8-selenium)

 (Session info: headless chrome=79.0.3945.79)
# chromedriver --version
ChromeDriver 78.0.3904.105 (60e2d8774a8151efa6a00b1f358371b1e0e07ee2-refs/branch-heads/3904@{#877})
jwodder commented 3 years ago

@satra If I start at 000020, I just get the "invalid session ID" error at 000021 instead.

yarikoptic commented 3 years ago

what is the source of the "invalid session ID" - selenium? web ui?

FWIW, drogon is running an elderly Debian stretch. May be it is worth generating/using a freshier "everything"? I have created a singularity image which you can try using on drogon:

dandi@drogon:~$ singularity exec /mnt/backup/dandi/containers/webshots.simg chromedriver --version
ChromeDriver 89.0.4389.114 (1ea76e193b4fadb723bfea2a19a66c93a1bc0ca6-refs/branch-heads/4389@{#1616})

I think it should have everything, if not - just adjust a copy of /mnt/backup/dandi/containers/Singularity.webshots and point me to it, I will produce a newer one (needs root access)

jwodder commented 3 years ago

@yarikoptic

what is the source of the "invalid session ID" - selenium? web ui?

Selenium.

If I run the script under Singularity, it hangs indefinitely at the edit-metadata step for 000021.

yarikoptic commented 3 years ago

and that is "great" in that -- it is what it does for me too, and browser asks to kill that page. So please add a timeout (30 sec?) and README.md should have that ("timed out at 30" or alike) for timing

jwodder commented 3 years ago

@yarikoptic The stalling occurs when the "Edit Metadata" button is clicked. I can't figure out how to add a timeout for that. I tried doing driver.set_page_load_timeout(30) and changing the click() call to send_keys(Keys.ENTER) as suggested here, but it's not making a difference.

satra commented 3 years ago

shouldn't the script be checking if "edit metadata" is enabled, before trying to click it?

regarding timeout, you can do something like this. the details of presence would vary for different use cases.

        try:
            element = WebDriverWait(driver, 5).until(
                EC.presence_of_element_located((By.NAME, "allow"))
            )
        except TimeoutException:
            pass
        else:
            element.click()
jwodder commented 3 years ago

@satra It shouldn't matter if it's enabled; clicking a disabled button should not cause the script to hang indefinitely. The button is already present when the page is loaded, so I don't see how your code applies to this situation.

satra commented 3 years ago

that piece of code waits to see if a button is available before clicking. you can set the number 5 to whatever you want. on different tests, etc.,. it could take a bit of time to load all the elements. also you can check if there was a timeout before an element was detected before you click. it simply ensures a time window within which an element can appear. things don't happen as instantaneously on the driver as we might perceive them to.

jwodder commented 3 years ago

@satra I don't believe that the existence of the button is the problem. The problem is that the page legitimately hangs indefinitely when the button is clicked, and I can't figure out how to set a timeout for that.

yarikoptic commented 3 years ago

re "greyed out Edit Metadata" -- I thought that dandibot was one of the "admin users" which would be able to trigger it since otherwise we would not have a way to get those webshots. I have checked in incognito window -- nope, dandibot cannot "edit metadata". Filed: https://github.com/dandi/dandiarchive/issues/647 I have added dandibot to 000029 though and verified that now it can edit metadata there.

yarikoptic commented 3 years ago

@jwodder if no better way found -- can't we do each "selenium test" in a new thread and kill the thread if times out, or that would leave shared selenium instance still hanging? Then we might want to instantiate it within a thread (would be pain with all the logging in though I am afraid) ?

yarikoptic commented 3 years ago

BTW, you might get a better idea on where it is hanging by py-spy top -p that process and then see at the corresponding code/stack on either some provisions are made for timing out etc or how possibly to get it "unstuck"

jwodder commented 3 years ago

@yarikoptic py-spy shows that it's hanging on a socket call to readinto. I can't find any Selenium settings for controlling that timeout. I would rather not resort to threads.

yarikoptic commented 3 years ago

@jwodder if you shared the full traceback, I could have look as well. We would need to either see it addressed at selenium level OR go for threads if that is the only way (I would also hate to introduce them)

jwodder commented 3 years ago

@yarikoptic The traceback from the latest run (in which it hung for about five to ten minutes on 000022's edit-metadata before crashing) is:

Traceback (most recent call last):
  File "/home/dandi/cronlib/dandi-api-webshots/tools/make_webshots.py", line 151, in <module>
    readme += process_dandiset(driver, ds)
  File "/home/dandi/cronlib/dandi-api-webshots/tools/make_webshots.py", line 97, in process_dandiset
    act()
  File "/home/dandi/cronlib/dandi-api-webshots/tools/make_webshots.py", line 70, in click_edit
    submit_button.send_keys(Keys.ENTER)
  File "/usr/lib/python3/dist-packages/selenium/webdriver/remote/webelement.py", line 510, in send_keys
    self._execute(Command.SEND_KEYS_TO_ELEMENT,
  File "/usr/lib/python3/dist-packages/selenium/webdriver/remote/webelement.py", line 672, in _execute
    return self._parent.execute(command, params)
  File "/usr/lib/python3/dist-packages/selenium/webdriver/remote/webdriver.py", line 318, in execute
    self.error_handler.check_response(response)
  File "/usr/lib/python3/dist-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
from unknown error: cannot determine loading status
from tab crashed
  (Session info: headless chrome=89.0.4389.114)
yarikoptic commented 3 years ago

well, this is already post hanging error handling, traceback of when it was still waiting would have been more useful. Looking at the code, it seems it might be using urllib3's urlopen.. I wonder if a hint (well against urllib2) in https://stackoverflow.com/questions/16772795/urllib2-urlopen-will-hang-forever-despite-of-timeout would still work here

import socket
import urllib2

# timeout in seconds
timeout = 10
socket.setdefaulttimeout(timeout)

i.e. if we set timeout to e.g. 30 sec -- would we get to this WebDriverException faster to announce the fail

yarikoptic commented 3 years ago

FWIW I tried with this obnoxious

from selenium import webdriver
import socket

# timeout in seconds
socket.setdefaulttimeout(0.0001)

browser = webdriver.Chrome()
browser.get('https://gui.dandiarchive.org/#/dandiset/000026')
browser.close()

and it timed out before succeeding ;-)

edit: although timed out not in interaction with dandi but with browser itself :-) making it 0.5 sec -- timed out on that get

jwodder commented 3 years ago

@yarikoptic I'm trying to test out socket.setdefaulttimeout(), but it appears that the login button isn't loading in time, leading to an IndexError, despite having set driver.implicitly_wait(10).

yarikoptic commented 3 years ago

30 seconds isn't enough for login???