dandi / dandi-api-webshots-tools

0 stars 0 forks source link

edit metadata and file listing get "timeout" for many (but not all) dandisets #10

Closed yarikoptic closed 2 years ago

yarikoptic commented 2 years ago

see e.g. https://github.com/dandi/dandi-api-webshots#000011 but even more in https://github.com/dandi/dandi-api-webshots-staging#200011

edit: timings are also off, since looking at metadata of https://gui.dandiarchive.org/#/dandiset/000032/draft is quite quick but on staging reports 20 sec (locally it does pause for a bit but loads in 2-3 secs) and @dchiquito mentioned might be due to the fact that now it goes to a dedicated URL.

jwodder commented 2 years ago

@yarikoptic I've identified three problems so far:

yarikoptic commented 2 years ago

what do we use github api here for? (I thought it is just to login and for that we do not even go through API)

jwodder commented 2 years ago

@yarikoptic We don't use the API, but logging into GitHub too often is still subject to rate limiting.

yarikoptic commented 2 years ago

ah, gotcha.

yarikoptic commented 2 years ago

I wonder if response on rate limiting error has some field like try-after ?

jwodder commented 2 years ago

@yarikoptic It's an HTML page served when trying to log in via the web, so I doubt it. Also, I get the feeling that GitHub doesn't want to disclose the exact parameters of their secondary rate limits.

jwodder commented 2 years ago

@yarikoptic This page implies that a sleep time of 1 second between logins should be enough.

jwodder commented 2 years ago

@yarikoptic I just ran a test run with a two-second sleep before each webshot, and it got through 17 shots before hitting the secondary rate limit.

yarikoptic commented 2 years ago

me is confused -- we are authenticating for each webshot separately? why it is not reusing the same authenticated session?

jwodder commented 2 years ago

@yarikoptic Sessions (or some part of the Chrome driver instance, at least) aren't pickleable, so they can't be transferred or reused between processes.

yarikoptic commented 2 years ago

Do we have separate process per each dandiset or page? Then they should be batched - each process should go through multiple

jwodder commented 2 years ago

@yarikoptic Per each page. The idea was to keep driver crashes limited to just a single page and prevent them from taking down other page snapshots with them.

Then they should be batched - each process should go through multiple

Multiple what?

yarikoptic commented 2 years ago

multiple pages, i.e. to have each process be given a list of pages to visit, it would then yield (add to queue) results per each page as soon as it obtains them. If it crashes, main process restarts the process with the remaining pages to be visited.

jwodder commented 2 years ago

@yarikoptic Do you remember exactly why we needed to use subprocesses in the first place? I've traced it back to https://github.com/dandi/dandi-api-webshots/issues/19, which says the goals were parallelization (which didn't work out, and would likely just lead to hitting secondary rate limits even faster) and "continu[ing] with a new driver in case of driver crash." I'm not clear on how the code before that issue was unable to properly deal with driver crashes.

EDIT: It seems that the previous code for reinitializing the driver on a crash was disabled "based on 000040 timeout experience", but I'm not clear on the nature of that "timeout experience."

yarikoptic commented 2 years ago

I recall that resetting the driver in the same process didn't work, thus decision was made to go subprocesses etc, and no longer remember what 000040 timeout experience was unfortunately, besides it is the commit which added that https://github.com/dandi/dandi-api-webshots/commit/f6012cb150160fea94d5fc71a8e7bdb186bf356c

jwodder commented 2 years ago

@yarikoptic I've rewritten the code to snapshot all the pages in a single subprocess that is restarted on crashes, but currently attempts to snapshot 000021's "Edit Metadata" page are failing when trying to click the "Metadata" button with:

selenium.common.exceptions.ElementClickInterceptedException: Message: element click intercepted: Element <a data-v-4cb78ccc="" href="#/dandiset/000021/draft/metadata" class="v-btn v-btn--block v-btn--depressed v-btn--flat v-btn--outlined v-btn--router theme--light v-size--default" id="view-edit-metadata">...</a> is not clickable at point (84, 1382). Other element would receive the click: <div data-v-0891bd2e="" class="Cookie Cookie--bottom Cookie--blood-orange">...</div>

This exception is a subclass of WebDriverException, which the code currently just reraises immediately, so the subprocess just crashes over & over.

yarikoptic commented 2 years ago

I believe it is the https://github.com/dandi/dandiarchive/issues/947 I filed a few weeks back. Please check out @mvandenburgh comment https://github.com/dandi/dandiarchive/issues/947#issuecomment-953849537

jwodder commented 2 years ago

@yarikoptic The code already selects the "Metadata" button by ID (specifically, by the XPATH //a[@id="view-edit-metadata").

jwodder commented 2 years ago

@yarikoptic The best way to address this for now would probably be to just treat ElementClickInterceptedExceptions the same way as non-WebDriverException errors and return the error messages in place of the time; is that acceptable? Though this raises the question of exactly which WebDriverExceptions do and don't warrant restarting the process.

yarikoptic commented 2 years ago

echoing on that https://github.com/dandi/dandiarchive/issues/947#issuecomment-953849537 -- could you try to increase the height of the rendered page -- may be it would somehow help to make that button actionable? may be @mvandenburgh has some other ideas as well?

jwodder commented 2 years ago

@yarikoptic I don't think the page height is the issue; it looks like the sidebar isn't being rendered for 000021. Compare the screenshots of 000021's landing page:

landing

and 000003's landing page:

landing

yarikoptic commented 2 years ago

oh... well -- it is rendered but under that metadata display, so if you scroll -- you will see all those buttons. But indeed increasing the height is IMHO no the right way forward. A quick workaround could be just increase the width e.g. by some % until it starts rendering sidebar where expected.

In the longer run: @mvandenburgh what is the rule (CSS) whenever it decides to place sidebar under the main body instead of the sidebar? could it be made more "predictable" (e.g. always have it if width is >= XXX pt, or smth like that)

mvandenburgh commented 2 years ago

In the longer run: @mvandenburgh what is the rule (CSS) whenever it decides to place sidebar under the main body instead of the sidebar? could it be made more "predictable" (e.g. always have it if width is >= XXX pt, or smth like that)

The left/right margins of the "main body" column are causing the width to overflow, which causes the sidebar column to get pushed underneath. I'll refactor this to use padding within each column instead, which should ensure there is always enough space for the sidebar to fit without wrapping.

jwodder commented 2 years ago

@yarikoptic The script runs without error now.