SSHOC / sshoc-marketplace-backend

Code for the backend
Apache License 2.0
2 stars 0 forks source link

Handle concepts having the same ord value #420

Closed KlausIllmayer closed 6 months ago

KlausIllmayer commented 9 months ago

When creating a new concept for a vocabulary using POST /api/vocabularies/{vocabulary-id}/concepts it is possible to set a ord-value in the body of the call. This is not documented in Swagger but still is allowed. In principle, this could be a way to order the concepts, but it seems that there is a bug connected with it. It will not guarantee that there is only one ord per concept but instead will allow to have multiple concepts having one ord value. The only vocabulary that is allowed to have new concepts is sshoc-keyword. There are multiple concepts (> 1k) having the same ord value 843 and around 30 concepts having the same ord value 716. When creating a new concept for this vocabulary with the mentioned POST call and this body:

{
  "code": "testord",
  "ord": 716,
  "label": "testord",
  "definition": "testord"
}

it will add another concept to the ord value 716 and not inserting it (changing all ords >= 716 to ord+1 and then inserting it in the empty space so that there is only one concept with the ord 716). This led to problems with running a full GET /api/vocabularies/sshoc-keyword, see #397 It may also explain why we have multiple concepts sharing the same ord value. Chances are high that an ingest pipeline used the ord value hard coded when importing a lot of items having previously unknown keywords.

The fast solution would be by backend to ignore ord-values when getting a POST that creates a new concept. The more nicer solution would be to implement an algorithm that shifts ord-values higher as the one of the POST call. I think we can life with the fast solution as we currently don't use the ord in the frontend (usually vocabularies are shown sorted by alphabet).

When implementing a solution please also deal with the current situation. We would need an update script that deals with the multiple concepts having the same ord and dissolve this situation.

tparkola commented 8 months ago

In case of POST that creates a new concept - we could change it to raise an error when the ORD value is already used by another concept. Would that make sense, maybe it would be better for your scenario? POST would still allow to set ORD manually, but you could not set it wrongly. However, if ORD is not used currently in POST, then it would be simpler not to consider ORD at all and set it only automatically. Please let me know your preference.

tparkola commented 7 months ago

Please review thoroughly as the changes relate to all concepts, i.e. their ord value may change after the fix is applied (ord values are updated with the migration script).

KlausIllmayer commented 6 months ago

Looks good, no duplicate ords anymore in the database table. And when creating a concept like explained in the first comment, it will accept it but give it an ord at the end of the list, which is as expected (= ord is ignored). Did you also delete the ord-information from all API outputs? Because I don't see it anymore. Anyway, this makes also quite sense, as the ord is not really useful for concepts. Still to check: if there are some changes connected to the ord

tparkola commented 6 months ago

I did not change ord visibility - it should work as previously. I see that in the Concept DTO ord is not present (see src/main/java/eu/sshopencloud/marketplace/dto/vocabularies/ConceptDto.java). I can check other outputs if you let me know the API call you use.

KlausIllmayer commented 6 months ago

Thanks for looking into the DTO. It is not necessary to have ord in the output. In general, we could think about removing this field, as I don't see any need for it due to the structure of vocabularies. Additionally, the frontend currently delivers concepts of vocabularies either in alphabetical order (when looking for them in the edit forms) or based on the order how they were added (when showing the details of an item). For now, lets leave it and if there should be the need to use ord somewhere in the output - e.g. maybe showing the details of an item could be based more on the predefined order and not on the added order - we could discuss it again.