Open dpancic opened 3 years ago
In GitLab by @vronk on Nov 3, 2021, 13:40
mentioned in issue sshoc-marketplace-backend#128
In GitLab by @vronk on Nov 3, 2021, 13:40
marked this issue as related to sshoc-marketplace-backend#128
In GitLab by @vronk on Nov 3, 2021, 13:43
We agree to try to implement this via notebooks first. This seems more easily doable with respect to available development resources, even though it is less efficient in the runtime, because notebook has to process all the items sequentially, sending corresponding requests per item. Backend would have probably means to process the item-set more quickly server-side, however it would require implementation work on the backend, for which we currently don't have the capacity.
In GitLab by @vronk on Feb 2, 2022, 15:12
Notebooks seem mostly a solution, we keep this open as low-priority if we see that there are operations which would need server-side processing.
In GitLab by @vronk on Feb 2, 2022, 15:12
unassigned @tparkola and @tparkola
In GitLab by @laureD19 on Jun 22, 2022, 11:57
marked this issue as related to sshoc-marketplace#84
In GitLab by @KlausIllmayer on Sep 14, 2022, 10:43
The workflow would look like this:
System moderator
should hopefully work, we need to check)Notify @laureD19 @kreetrapper @aureon249 @cesareconcordia
In GitLab by @KlausIllmayer on Sep 14, 2022, 10:46
What we need to check: if the GET JSON response could be used 1:1 for the PUT endpoint JSON. We had issues in the past about this, they should be solved but I never checked it for all of the possible values. We could create on stage an item that has all values filled out and test it with this one, if after running the script it is 1:1 as the ingested version.
In GitLab by @KlausIllmayer on Sep 14, 2022, 12:34
The API call to get all ingested items from one specific source: GET /api/item-search?d.status=ingested&f.source=INSERT_LABEL_OF_SOURCE
- you need to be logged in as a moderator to get results. You can try it out on stage where we have some suggested items from the source SSK Zotero Resources
. Get the bearer token for a moderator and call on stage GET /api/item-search?d.status=suggested&f.source=SSK Zotero Resources
. It should return some items (as long as no one approves this suggested items).
In GitLab by @laureD19 on Oct 26, 2022, 15:42
moved from sshoc-marketplace#89
We discussed, that we need not only a bulk approval but also a bulk reject workflow. After the ingestion pipeline (re-)harvested data from a source, these items get the status ingested
and are not published. Moderators may now decide either to approve all of these items or they decide to reject all of them. Moderators do have a sample look at the ingested items and if they decide that everything was fine with the ingestion, all items of this source coming from an ingestion pipeline are approved and will become the new published version of the items. But moderators may also find problems in the items coming from the ingestion pipeline, e.g., a mapping is not valid anymore due to changes at the source, then it will be necessary to reject all items coming from this source of the ingestion pipeline.
Here is an example workflow, with the option to approve or the option to reject the items, including the API endpoints to use:
ingested
. Moderators look at these items. If there are many of them, this will be based on random samples. Especially items that were already changed on SSHOMP and also changed at source should be covered (but it could be complicated to identify such items, TODO: we should collect what kind of hints are given, if such merges happened, maybe we get a logfile from ingestion pipeline that can tell us, where to look). After inspection, moderators decide either to approve all of these items (it could be, that some special cases where already approved manually) or to reject them. TLDR: Moderators look into the ingested items from a source and decide to run the bulk action either for approving or for rejecting these itemsSources
in the moderation dashboard. Choose the status facet Ingested
and from the sources facet choose the source that should be handled. Copy the name of the source from the facet (unfortunately it is not possible to mark the label and CTRL+C it, you need to type it down), this name is also the label of the source that is used as input parameter. The second parameter is simple a tag that makes clear that the items in the bulk action are approved or rejected. TLDR: two parameters to be prepared and added into the script by moderators, one is "label of source" (string field, called for describing the workflow {param_source_label}
) the other is either "approve items" or "reject items" (boolean field or maybe better to make it clearer a string field that must be either "approve" or "reject", called for describing the workflow {param_applied_action}
)POST /api/auth/sign-in
GET /api/sources
and look into the result if the {param_source_label}
can be found in one of the labels (if not, give an error message)param_source_label
: GET /api/item-search?d.status=ingested&f.source={param_source_label}
(be aware to url-encode the {param_source_label}
as it can contain spaces or other special characters); print the statistics (number of found items)GET /api/{category}s/{persistentId}/versions/{id}
(the category that you get in item-search is singular but it must be here plural therefore the additional s; I think we have a method that maps to the correct category in the API call, that would be safer) of all items found at step 5, the table should consist of field persistentId
, field category
,field label
, field lastInfoUpdate
, the review link to item in the frontend: {frontend-url}/{category}/{persistentId}/version/{id}/review
and if it exists the value of the property conflict-at-source
(in the json it can be found in {"properties"}[]{"type"}{"code"="conflict-at-source"}
- the value is on the same level as "type" and identified as "value") - sorted by first label
and second persistentId
{param_applied_action}
is approve
script approves the item: to do this, you revert as moderator the ingested version which makes it the published one with PUT /api/{category}s/{persistentId}/versions/{id}/revert
(beware that category is plural therefore the additional s, see also comment in step 6) check the http-return-code, it must be "200" if everything went okay, otherwise give an error message{param_applied_action}
is reject
script rejects the item: to do this, delete as moderator the ingested version with DELETE /api/{category}s/{persistentId}/versions/{id}
(beware that category is plural therefore the additional s, see also comment in step 6)@cesareconcordia I hope the workflow is now more clearer and I hopefully covered all necessary steps, if not, please comment in this issue. There are also examples on the stage and on the development instance of marketplace.
Things to check together with @laureD19:
conflict-on-source
-property that has the value true
I guess a moderator needs to check it manually and also needs then to remove the conflict-on-source
-property (have a look at https://github.com/SSHOC/sshoc-marketplace-backend/issues/85#issuecomment-1352878754 for more information on this property)Adding a point which needs to be handled: if there are two re-ingests of the same source, we will have of one item two different versions. This can be seen in the table, as there will be two entries with the same persistentId and label. In such cases, we need an agreement how to handle such a situation. Going through the proposed workflow alone could lead to irritating results (depending on the sort algorithms it may approve a version of the first ingest or of the second ingest). Most probably, the script will also run in an error, as the approval of a version will reject all other versions, that are then not available anymore. I guess the best solution is in such cases to only take the newest version of an item, handle this version and if it is an approve, the other versions will disappear (if it is a reject, the other versions won't disappear, so I guess we like to reject also the other versions) Opinions on this @laureD19 @cesareconcordia ?
@KlausIllmayer : workflow seems clear, thanks. Will talk about its implementation during the next EB call
initial test for bulk rejection of ingested items available here
could you have a look @cesareconcordia and @KlausIllmayer and tell me what should be improved?
review and create smaller tasks
In GitLab by @KlausIllmayer on Jul 1, 2021, 18:22
We need bulk actions when a lot of items are involved like approving an ingest. There are - as I see it - three ways how to handle this (and in principle, all of this three ways could operate in parallel):
ingested
from sourcex
(disadvantage: take it or leave it - granularity could be a problem)Open for discussion to find a solution: @vronk @laureD19 @tparkola @egray523 @vronk @stefanprobst @cesareconcordia