CatalogueOfLife / backend

Complete backend of COL ChecklistBank
Apache License 2.0
15 stars 11 forks source link

Testing TASKS tool in the project 3: error 500, etc. #1380

Open yroskov opened 4 days ago

yroskov commented 4 days ago

Testing Tasks tool in the project 3 (CoL draft) at https://www.checklistbank.org/catalogue/3/tasks

I have chosen WCVP checklist id2232. I am expecting to get a report on duplicated names only inside WCVP sectors included in the Project 3.

yroskov commented 4 days ago

Problem 1 I have got message error 500:

image

yroskov commented 4 days ago

Problem 2 I have got report on duplicated names bewtween WCVP and all other checklists. Whereas I need the report on duplicated names only inside WCVP sectors included in the Project 3.

https://www.checklistbank.org/dataset/3/duplicates?authorshipDifferent=true&category=binomial&limit=100&minSize=2&mode=STRICT&sourceDatasetKey=2232&status=accepted

image

yroskov commented 4 days ago

Problem 3 The report on duplicated names has no option (i.e. check-boxes) for applying decision:

image

yroskov commented 4 days ago

Quite possible, I did not understand how to use the tool...

mdoering commented 3 days ago

1) 500 responses are nearly always a bug, I will look into that Monday.

2) Reporting duplicates across all names in the project as long as at least 1 name comes from the chosen source is the expected behavior. This was the initial requirement when we first introduces the duplicate tool to projects. You, we wanted to search also for duplicates across sources. Sth you could not do in workbenches in sources alone. If you desire to look only for duplicates within a source in a project we need to add another parameter that restricts the search to names just in the source. I can do that, it should not be difficult.

3) I have to defer to @thomasstjerne. There is an Identifier issue with applying decisions through the project, but we should be able to deal with it. We just need to get hold of the original source id which we keep in the verbatim source record for all synced records - unless it did not exist in a source, e.g. genera and species can be created during a sync if missing in the source which only has species and subspecies respectively.

mdoering commented 3 days ago

I do not get a 500 now. Do you still see that? I did deploy the backend many times some 10-12h ago which might have caused problems occasionally...

yroskov commented 1 day ago

I confirm: no error 500 for me today, 2024-11-25

yroskov commented 1 day ago

Sorry for a confusion. Let's clarify:

Functionality which was implemented before: Reporting duplicates across all names & GSDs in the project.

Functionality of today: Reporting duplicates between selected GSD and all other GSDs in the project.

Additional functionality which I need: Reporting duplicates inside selected GSD, i.e. inside its part (= CoL sectors) included in the project. = If you desire to look only for duplicates within a source in a project we need to add another parameter that restricts the search to names just in the source. I can do that, it should not be difficult.

(Again, sorry for a confusion, I thought that I am testing a new implementation - reports inside GSD included in the project).

All 3 functionalities are necessary. Perhaps, switcher "across whole project/selected checklist vs others/inside checklist" (or something like that) could be helpful.

mdoering commented 15 hours ago

I am adding a new boolean sourceOnly parameter that restricts all considered names to be from the same source, not just one record. There will be a new switch in the task board & duplicate tool for that

mdoering commented 15 hours ago

@thomasstjerne I have added the new parameter to the API which will be deployed to today. Could you update the frontend adding new checkboxes? See https://github.com/CatalogueOfLife/checklistbank/issues/1504

thomasstjerne commented 14 hours ago

Additional functionality which I need: Reporting duplicates inside selected GSD, i.e. inside its part (= CoL sectors) included in the project.

I actually thought that duplicates within a single source GSD in a project was handled here: https://www.checklistbank.org/catalogue/3/dataset/2232/duplicates?catalogueKey=3&limit=50&offset=0

i.e., select the source, and then select duplicates

mdoering commented 13 hours ago

That shows duplicates directly in the source, not in the project after the data has been synced. Often there is not much difference, but synced project data has decisions applied and most importantly sometimes contains far less data as only some parts/sectors have been used, e.g. from ITIS