SSHOC / sshoc-marketplace-backend

Code for the backend
Apache License 2.0
2 stars 0 forks source link

Different results of merged versions depending on parameter order #134

Closed dpancic closed 1 year ago

dpancic commented 2 years ago

In GitLab by @KlausIllmayer on Jan 4, 2022, 10:22

When calling GET /api/{category}/{persistentId1}/merge?with={persistentId2} sometimes you get different results based on the order of persistentId1 and persistentId2. To give an example on stage: running GET /api/tools-services/XE6Spj/merge?with=C75gkx results for the field description in something different then GET /api/tools-services/C75gkx/merge?with=XE6Spj, the first call gives

Click to expand "description": "Agisoft Metashape is a stand-alone software product that performs photogrammetric processing of digital images and generates 3D spatial data to be used in GIS applications, cultural heritage documentation, and visual effects production as well as for indirect measurements of objects of various scales. / Previously known as Agisoft PhotoScan, Agisoft Metashape is a stand-alone software product that performs photogrammetric processing of digital images and generates 3D spatial data to be used in GIS applications, cultural heritage documentation, and visual effects production as well as for indirect measurements of objects of various scales.",

which looks correct: there are two different descriptions in the items so both are combined separated by /. But strangely, if you run the second call which only swaps the persistentIds you will get a different result:

Click to expand "description": "Previously known as Agisoft PhotoScan, Agisoft Metashape is a stand-alone software product that performs photogrammetric processing of digital images and generates 3D spatial data to be used in GIS applications, cultural heritage documentation, and visual effects production as well as for indirect measurements of objects of various scales.",

There you only get the description of the first item (persistentId: C75gkx) and you will not see the description of the second item (persistentId: XE6Spj). I would expect to get the same result for both calls independently of the order of the persistenIds.

@tparkola can you have a look why this happens?

informing @cesareconcordia who discovered this issue when implementing the merge in the curation workflow

dpancic commented 2 years ago

In GitLab by @tparkola on Jan 4, 2022, 12:41

This happened, because the code checks if item A description contains other merged item B description. As you can notice in 2nd case you have the same description but extended with "Previously known as " (A contains B), therefore slash version is not added. Of course I can change so that only equal will be check not contains.

dpancic commented 2 years ago

In GitLab by @KlausIllmayer on Jan 4, 2022, 12:51

Thanks for your answer, you are right, I didn't notice this. Indeed, a very clever algorithm. Does this imply - also for the other fields - that if there is no conflict, then a merge is done automatically and only if there is a conflict, then the slash version is created? Or is it only in such cases where a text is contained that it is already merged?

Anyways, after explanation I think it can stay as it is. Asking @cesareconcordia @vronk @laureD19 about their opinion before further deciding.

dpancic commented 2 years ago

In GitLab by @tparkola on Jan 4, 2022, 13:10

Yes, if no conflict or one field contains other - no slash is added. Unfortunately, merge is not done automatically, GET method only allows you to see the preview (check fields) and prepare the core item accordingly (via front-end) then you have to call the POST method to merge.

dpancic commented 2 years ago

In GitLab by @KlausIllmayer on Jan 4, 2022, 13:29

The current behaviour is insofar strange, as it gives different results depending on what is the item A. Is there a way to have something like the best result, trying out the merges until you find one that causes no conflict? In the example above: item A is part of item B, therefore it is merged; the same does not happen if you swap the items but probably I like to see it merged.

Maybe there should be a parameter that allows to control the behviour, e.g. something like ?automerge=false which would always give back a separated return value?

dpancic commented 2 years ago

In GitLab by @tparkola on Jan 5, 2022, 11:03

Yes, it gives different, because the crucial part is type of A item. If it is for example dataset, then the new merged item will be dataset, therefore the order is important. What is more, in discussing sample A is a part of B, but only in case of the description fields. Base item - is the primary one, and others are compared to him. GET method is created so you can see and choose fields values, whether you want some changes, maybe only labels from B or A. And no changes in data should be made by GET method. Therefore the POST method merges and saves changes.

If A is B in every aspect you will get the A object ItemDto in return (no slashes), without flag that tells is equal or not. Therefore your suggestion might be hard to implement, but is still open for the discussion and opinion from others.

dpancic commented 2 years ago

In GitLab by @KlausIllmayer on Jan 7, 2022, 10:28

There is no consensus about prefering the current algorithm but we agreed at today`s curation meeting that it will not lead to information loss, therefore we leave it as it is also because it is documented.

dpancic commented 2 years ago

In GitLab by @KlausIllmayer on Jan 7, 2022, 10:28

changed the status to Resolved by closing the incident