CatalogueOfLife / backend

Complete backend of COL ChecklistBank
Apache License 2.0
15 stars 11 forks source link

CLB: error 503 with TASKS reports #1182

Open yroskov opened 2 years ago

yroskov commented 2 years ago

I cannot open TASK reports even with 399 names (50 per page). Screen with endlessly spinning circle. (reported 2022-11-15 via SLAC.

Then, message "Request failed with status code 503" appears.

image

https://www.checklistbank.org/catalogue/3/dataset/1204/duplicates?acceptedDifferent=true&authorshipDifferent=true&catalogueKey=3&category=binomial&limit=500&minSize=2&mode=STRICT&offset=0&status=accepted&status=synonym&withDecision=false

yroskov commented 2 years ago

Just in case, if it's not a server problem. Console says:

This page uses the non standard property “zoom”. Consider using calc() in the relevant property values, or using “transform” along with “transform-origin: 0 0”. duplicates false 2 index.js:124:22

image

yroskov commented 2 years ago

Hmm, TASKS work well with 990 duplicates per page (= 1167 names) for World Plants:

https://www.checklistbank.org/catalogue/3/dataset/1141/duplicates?acceptedDifferent=true&authorshipDifferent=true&catalogueKey=3&category=binomial&limit=990&minSize=2&mode=STRICT&offset=0&status=accepted&status=synonym&withDecision=false

Could it be Staphbase data specific problem? @gdower, it's a new version from TW. Perhaps, something new in data may affect TASKS reports

yroskov commented 2 years ago

Another experiment, SF Orthoptera dataset imported from TaxonWorks: TASKS report with 1069 names on the page opens quickly: https://www.checklistbank.org/catalogue/3/dataset/1021/duplicates?acceptedDifferent=false&authorshipDifferent=false&catalogueKey=3&category=binomial&limit=990&minSize=2&mode=STRICT&offset=0&status=accepted&status=synonym&withDecision=true

@gdower & @mdoering, looks like the StaphBase case is not caused by routine export from TaxonWorks. More likely, it caused by changes in StaphBase data.

yroskov commented 2 years ago

The same problem appears now with newly imported Systema Dipterorum (SD reports work well few days before):

https://www.checklistbank.org/catalogue/3/dataset/1101/duplicates?acceptedDifferent=false&authorshipDifferent=true&catalogueKey=3&category=binomial&limit=500&minSize=2&mode=STRICT&offset=0&status=synonym&withDecision=false

yroskov commented 2 years ago

Two (only two) TASKS reports failed to open in newly imported 3i Auchenorrhyncha:

https://www.checklistbank.org/catalogue/3/dataset/2317/duplicates?acceptedDifferent=false&authorshipDifferent=true&catalogueKey=3&category=binomial&limit=500&minSize=2&mode=STRICT&offset=0&status=synonym&withDecision=false

and

https://www.checklistbank.org/catalogue/3/dataset/2317/duplicates?catalogueKey=3&category=uninomial&limit=500&minSize=2&mode=STRICT&offset=0&rank=genus&withDecision=false

mdoering commented 1 year ago

the 503 error is just our front cache varnish responding that no response could be fetched in an appropriate time. A backend timeout if you like. The actual query here runs far too long to ever finish. The ultimate cause is decorating all records with a full classification. If there are 1000 duplicates looking up 1000 classifications can take some time if the dataset uses strongly nested hierarchies. But Staphbase does not look very deeply nested really.

https://github.com/CatalogueOfLife/backend/issues/1122 would solve the problem like many other open performance issues.

yroskov commented 1 year ago

But why this problem appears only now? Which recent change caused it?

mdoering commented 1 year ago

A surprise for me too. Nothing has changed in that area for a long time

yroskov commented 1 year ago

2022-12-02: Now all Task reports can be open (checked with StaphBase, 3i & WCVP

yroskov commented 1 year ago

Today, 2022-12-14, the problem reported above re-appeared again.

For example, I am not able to open TASKS reports ACC-SYN species (different accepted, different authors) 1 of 399 https://www.checklistbank.org/catalogue/3/dataset/1204/duplicates?acceptedDifferent=true&authorshipDifferent=true&catalogueKey=3&category=binomial&limit=500&minSize=2&mode=STRICT&offset=0&status=accepted&status=synonym&withDecision=false SYN-SYN species (different accepted, different authors) 0 of 972 https://www.checklistbank.org/catalogue/3/dataset/1204/duplicates?acceptedDifferent=true&authorshipDifferent=true&catalogueKey=3&category=binomial&limit=500&minSize=2&mode=STRICT&offset=0&status=synonym&withDecision=false

After long wait 503 error appear on the screen: image

mdoering commented 1 year ago

again Staphbase?

yroskov commented 1 year ago

Yes, StaphBase. Let me check WCVP...

yroskov commented 1 year ago

Tasks in WCVP works well. Even ACC-SYN species (different accepted, different authors) 1 of 9016 with 990 duplicates per page setting.

So, yes, the problem is with StaphBase (exported from TaxonWorks).

Now checking other TW checklists....

3i Auchenorrhyncha ACC-SYN species (different accepted, different authors) 508 of 508 works well. SF Orthoptera ACC-SYN species (same accepted, same authors) 530 of 530 works well.

So, the problem is not related to TaxonWorks export.

mdoering commented 1 year ago

Thats interesting. Hope I find some time to look closer what might be the cause

yroskov commented 1 year ago

@mdoering, Tasks reports are very slow again (ITIS, 2023-01-04). Even ACC-ACC species (different authors) of 136 spp took few minutes to be opened. ACC-SYN species (different accepted, different authors) 1023 of 1152 failed (error 503, even with the setting "50 duplicates per page"): image

mdoering commented 1 year ago

Ah, it is not the task board overview with the counts but the actual duplicates pages that have failed?

yroskov commented 1 year ago

Yes, actual report pages with duplicates

mdoering commented 1 year ago

So pages like this one? https://www.checklistbank.org/catalogue/3/dataset/2144/duplicates?acceptedDifferent=true&authorshipDifferent=false&catalogueKey=3&category=binomial&limit=100&minSize=2&mode=STRICT&offset=0&status=accepted&status=synonym&withDecision=false

Takes 10-20s here which is not unusual

yroskov commented 1 year ago

The problem was with ACC-SYN species (different accepted, different authors) 1023 of 1152 (https://www.checklistbank.org/catalogue/3/dataset/2144/duplicates?acceptedDifferent=true&authorshipDifferent=true&catalogueKey=3&category=binomial&limit=500&minSize=2&mode=STRICT&offset=0&status=accepted&status=synonym&withDecision=false) and SYN-SYN species (different accepted, different authors) 1831 of 1973 (https://www.checklistbank.org/catalogue/3/dataset/2144/duplicates?acceptedDifferent=true&authorshipDifferent=true&catalogueKey=3&category=binomial&limit=500&minSize=2&mode=STRICT&offset=0&status=synonym&withDecision=false). Error 503, even with the setting "50 duplicates per page").

Today both reports work well with the setting "500 duplicates per page".

yroskov commented 1 year ago

GO: Species Fungorum is in matching phase right now if you want to try to run Tasks queries. YR: I tested ITIS Task reports. All work well, despite of running matching phase in parallel. GO: Okay, well maybe it's not matching phase [slowed down CLB].

mdoering commented 1 year ago

ok. It looks like it is a complex situation with other concurrent processes, so it's hard to reproduce. Can you leave a note here each time you encounter it again with exact time (in UTC if possible), and URL being called? Then we can check logs what was happening and try to correlate things.

yroskov commented 1 year ago

2023-03-08: working with TASKS in World Ferns, I cannot open report on ACC-SYN species (different accepted, different authors) 70 of 369:

After some time of spinning circle, I got 503 error report:

image

However, World Plants, report ACC-SYN species (different accepted, different authors) 5311 of 6242 opens without problem.

mdoering commented 1 year ago

Was this during the release on the 9th? That might be troublesome to do other things on the project while it is running.

March 9th, 8:43 pm | March 9th, 10:03 pm UTC

yroskov commented 1 year ago

Was this during the release on the 9th?

No, this was before the release, during my work on duplicates in WFerns

mdoering commented 1 year ago

but not during the import, or could that be? It was imported on the 8th:

March 8th, 3:54 pm | March 8th, 4:10 pm

If you see this again please log the exact UTC (or local) time

yroskov commented 1 year ago

I am logging issue in the GitHub at the time when it was happened. If you can find the time in GitHub, you'll find exact time of last event (well, +1-2 minutes for the screenshot and text).

yroskov commented 1 year ago

One thought, if imports (or other running processes) would slow down reports on duplicates, why WFerns were affected, but WPlants were not at very same time?

mdoering commented 1 year ago

Because imports only effects queries at specific stages of the import when they both access the same tables. That is a rather rare coincidence, but happens. We could block all access to datasets while an import is running on the database, but that cause more problems than it solves?

mdoering commented 1 year ago
image

so that was 8th march, 9:06pm GMT+1 = CET. Which is 8:06pm UTC if I am right. 4 hours after the import. Not sure if that really collided