CatalogueOfLife / backend

Complete backend of COL ChecklistBank
Apache License 2.0
14 stars 11 forks source link

CLB: broken TASKS? #1300

Closed yroskov closed 3 months ago

yroskov commented 3 months ago

Working with TASKS reports https://www.checklistbank.org/catalogue/3/dataset/1101/tasks for Systema Dipterorum (id 1101) today, I met problem with application and displaying decisions.

For attention of @thomasstjerne & @mdoering

Case 1. Applied decisions "ignore" were not shown in the interface (see screenshot below) after successful(?) application of these decisions in the reports Identical tribe 0 of 37. https://www.checklistbank.org/catalogue/3/dataset/1101/duplicates?catalogueKey=3&category=uninomial&limit=50&minSize=2&mode=STRICT&offset=0&rank=tribe&status=accepted&withDecision=false) and in Identical genus 0 of 62 https://www.checklistbank.org/catalogue/3/dataset/1101/duplicates?catalogueKey=3&category=uninomial&limit=50&minSize=2&mode=STRICT&offset=0&rank=genus&status=accepted&withDecision=false

image

image

yroskov commented 3 months ago

Case 2. I have resolved all issues in ACC-ACC species (different authors) 2 of 512, I got error messages and TASKS panel is still red, says "2 of 512" resolved.

image

image

yroskov commented 3 months ago

Case 2, B

@thomasstjerne, unfortunately, I have the same problem in TASKS: applied decisions are not shown in the report and in the TASKS panel.

Today,I worked with ACC-ACC species (same authors) 0 of 342 in Systema Dipterorum.

I applied decisions for at least 50 pairs, but they are not shown in the report:

image

ACC-ACC species (same authors) 0 of 342 bar in TASKS panel does not show progress:

image

This is a serious blockage of my work now - I don't know which names are already resolved, and which need further work.

yroskov commented 3 months ago

Case 3

I am applying decision "Ambiguous Synonym" in ACC-SYN species (different accepted, different authors) 8 of 869

...and getting back a list of errors (perhaps, I tried to apply decisions on Friday, and they stored somewhere hidden). No progress in the bar:

image

I am applying decision "Ambiguous Synonym" in ACC-SYN species (different accepted, same authors) 0 of 77, all decisions successfully applied, report shown decisions:

image

However, after refreshing the page in browser, all decisions vanished in the interface:

image

...and bar in TASKS panel does not show progress:

image

yroskov commented 3 months ago

Case 3 B

I am applying decision "Ambiguous Synonym" in SYN-SYN species (different accepted, same authors) 0 of 53 image

...all decisions successfully applied, report shown decisions: image

But, after window re-load in a browser, all decisions vanished in the interface:

image

The bar in TASKS panel does not show any progress:

image

If I tried to apply decisions again to the same names, I get error message:

image

mdoering commented 3 months ago

If you want to know if decisions were actually created use the decision page to be sure. There are a few ignore decisions on tribes including Aedini: https://www.checklistbank.org/catalogue/3/decision?limit=100&mode=ignore&offset=0&rank=tribe

It looks like another search index problem at first glance

mdoering commented 3 months ago

Well, the ignore tribe task seems perfectly fine now for me. 37/37 and green:

image

@yroskov is there still sth wrong? It might take some seconds to update the search index after a decision was applied

mdoering commented 3 months ago

Is this correct now? SYN-SYN species (different accepted, same authors) 44 of 53

@thomasstjerne any idea why things might not show up quickly?

thomasstjerne commented 3 months ago

Not really. I don´t think the results are cached

mdoering commented 3 months ago

The genus duplicates report 61/62 and there is one genus Tauroconopa missing. Looking at decisions there is a broken one created 12th February:

https://www.checklistbank.org/catalogue/3/decision?limit=100&name=Tauroconopa&offset=0

Rematching fails as we have 2 identical genera and the original ID 1790251 is not used by these 2 any longer. @gdower do all TW exports change identifiers or is this a unique Systema Dipterorum problem?

@thomasstjerne there is no option to manually relink a broken decision like we do for sectors. Could we port the same lookup maybe?

mdoering commented 3 months ago

Looking at the stored and broken block decisions I see there is no parent property given. That would probably be the only way to disambiguate duplicates in these cases. @thomasstjerne can we store that for all duplicate decisions? It would allow the automatic rematching to find many more names.

And @yroskov, is block the right decision or would you not want the species underneath to be placed

this has 2 species: https://www.checklistbank.org/dataset/1101/taxon/2697344

this has 5: https://www.checklistbank.org/dataset/1101/taxon/2697360

Ignoring might be better, but then the species would end up under the family. We might want a new decision to ignore the name and merge/union all descendants with another target taxon given. Well I suppose that is exactly what a sector does already. Do you block and create new sectors for all these blocked genera?

In that case the 2 species are also duplicates, so blocking the genus is perfect

gdower commented 3 months ago

@gdower do all TW exports change identifiers or is this a unique Systema Dipterorum problem?

This is unique to SD because it's not permanently imported into the production server yet, so it's IDs will still change until the final TW import.

yroskov commented 3 months ago

Case 4

I believe that I resolved Tasks for Alucitoidea (id 2207) 2024-03-17, but TASKS panel does not show results (3 bars are still yellow): https://www.checklistbank.org/catalogue/3/dataset/2207/tasks

image

mdoering commented 3 months ago

All green here :)

image

Looks like unexpected caching somewhere

mdoering commented 3 months ago

@yroskov can you clear your browser cache or turn off the use of the browser cache? Do you use Chrome?

mdoering commented 3 months ago

I think I know whats going on. @thomasstjerne the task count calls are not within the project but scoped around the source dataset - which we treat as not changing too quickly and cache for an hour at least in varnish. I might have to reroute the call through the project!

yroskov commented 3 months ago

FireFox

mdoering commented 3 months ago

@thomasstjerne I am setting up a new resource that exposes duplicates for sources in a project. Instead of https://api.checklistbank.org/dataset/2207/duplicate/count?catalogueKey=3&category=uninomial&minSize=2&mode=STRICT&rank=order&status=accepted&withDecision=false

you can then use the following without the need to set the catalogueKey: https://api.checklistbank.org/dataset/3/source/2207/duplicate/count?category=uninomial&minSize=2&mode=STRICT&rank=order&status=accepted&withDecision=false

mdoering commented 3 months ago

In fact I think we can remove the catalogueKey query parameter alltogether with that project source specific routing.

yroskov commented 3 months ago

Case 4A

Seems, the problem is still there. 2024-03-19, I have resolved all Tasks for 3i Auchenorrhyncha (id 2317), but TASKS panel does not show results (yellow bars, even after hard refresh):

image

mdoering commented 3 months ago

@yroskov can you try again please? improved caching has been deployed

yroskov commented 3 months ago

Bingo! It works now: the panel shows me results of resolutions. (Checked with WCO (id2256) https://www.checklistbank.org/catalogue/3/dataset/2256/tasks).

Many thanks!