ec-jrc / re3gistry

Re3gistry is a reusable open source solution for managing and sharing ‘reference codes’, ensuring semantic interoperability across organisations.
European Union Public License 1.2
29 stars 21 forks source link

Bulk import in case of a hierarchical register #42

Closed AntoRot closed 3 years ago

AntoRot commented 3 years ago

I was creating a hierarchical register by adding the register and its (first level) items through bulk import. When I added the (second level) child item class, I realized that a whole bulk import for this item class is not allowed, but the bulk import shoud be performed for each parent item. That is not very usable as the parent items are 107.

Am I doing something wrong or is it actually so?

If so, I think that allowing a whole bulk import also for the (second level) child item class could be very useful.

I even tried to perform a whole bulk import with a CSV file including the field collection, but it failed despite I was notified that the bulk import had completed successfully.

The version of our registry was upgraded to the latest release 2.1.0.

Thank you very much for your help.

emanuelaepure10 commented 3 years ago

Dear @AntoRot,

Thank your veru much for bringing this in discussion.

Our idea for the bulk import is the same as for the management interface: you cannot add the parent before the parent exist as a valid item in the system.

That gives you the solution to this problem, you will need to create 2 files: one containing all the parents and one containing all the item with a parent. You will need first to add a bulk import with the parents, publish the items and just after add another bulk import for the items with the valid published parent.

I hope that this solve your issue Best regards

AntoRot commented 3 years ago

Dear @emanuelaepure10,

Thank you very much for your feedback.

I exactly followed the steps you described, i.e.:

As I said, after performing this latest step I was notified that the bulk import had completed successfully despite it failed.

Then I tried to add a bulk import of the child items only for a specific parent item (without considering the collection field in the file) and in this case the bulk import has worked. In this case the file included only the child items related to that specific parent item.

In the second file, in the collection field I used the localID of the parent items and not the URI. I think this is correct. Moreover for some items an external URI is used as localID as those items are already published in the INSPIRE registry and we wanted to publish only the extentions.

In order you to have more details I attach the two files (it-codelist.zip).

Thank you again.

emanuelaepure10 commented 3 years ago

Dear @AntoRot,

  1. Could you please tell me where in the header you have declared the parent field: LocalID|Language|collection|label|definition|dbgt-id-value ?

Before adding the second file you should have associeted the field "parent" to the reg_itemclass that you want and this should have been added to the header of the template file...something like "LocalID|Language|collection|parent|label|definition|dbgt-id-value"

  1. Is correct to use the localID for parent, collection and everything that is a relation.

  2. As for the localID (example: https://inspire.ec.europa.eu/codelist/HydroNodeCategoryValue) you know that they are used to compose the URI of the item. I dont think that this is a good idea. Why don't you use an internal localID and add a field external_link where you point the user to all the details of the item?

Thank you

AntoRot commented 3 years ago

Dear @emanuelaepure10,

I will reply to your questions as follows.

Could you please tell me where in the header you have declared the parent field: LocalID|Language|collection|label|definition|dbgt-id-value ?

Once I added the new register "codelist" and the item class "codelist-item" (which is created automatically by the system together with the register) and added some custom fields, I performed the bulk import of the first file and published the parent items. Then by interface, in the menu Structure, I added a new child item class "codelist-value"

add-new-item-class

and I filled in the fields of the content class including the field "parent" whose value is shown by default (please consider that when I created the new child item class the value of this field was "codelist-item [item]" and not that one shown in the image).

new-item-class-content

As you can see, in the table reg_itemclass there is the proper value of the field reg_itemclass_parent for this new child item class added.

 uuid                             |          localid          |            baseuri            | systemitem | active |       reg_itemclass_parent       | reg_itemclasstype | reg_status | dataprocedureorder |       insertdate        |          editdate          
----------------------------------+---------------------------+-------------------------------+------------+--------+----------------------------------+-------------------+------------+--------------------+-------------------------+----------------------------

 8f81ea04e282e23fa14493fc696869aa | registry                  | http://10.14.251.252          | f          | t      |                                  | 1                 | 1          |                  0 | 2020-11-11 12:07:02.412 | 2020-11-16 16:29:11.247536
 ef6008e54ef75ddcfa246415786db836 | codelist                  | http://10.14.251.252          | f          | t      | 8f81ea04e282e23fa14493fc696869aa | 2                 | 1          |                 32 | 2021-04-23 11:32:43.996 | 
 a13419d1875fb60ca68534aedd06127d | codelist-item             |                               | f          | t      | ef6008e54ef75ddcfa246415786db836 | 3                 | 1          |                 33 | 2021-04-23 11:32:44.052 | 
 17e5d12bc92ef9726d23e48b0d1f45b3 | codelist-value            |                               | f          | t      | a13419d1875fb60ca68534aedd06127d | 3                 | 1          |                 34 | 2021-04-23 11:46:05.552 | 

Then I added the field "collection" (to be used to link the child items to the parent items) and a custom field and tried to make the whole bulk import of the second file that didn't work. On the contrary, the bulk import for a single parent item works.

As for the localID (example: https://inspire.ec.europa.eu/codelist/HydroNodeCategoryValue) you know that they are used to compose the URI of the item. I dont think that this is a good idea. Why don't you use an internal localID and add a field external_link where you point the user to all the details of the item?

I would want to avoid to assign a new local ID (and consequently a new URI) to a reference code which is already published. I also know that the localID can be also used to include external URI. Actually this didn't create any problems, as also these items having external URIs are properly published, as you can see

external-URI

I also tried to add a bulk import of the child items performed for a specific parent item having an external URI (i.e. http://inspire.ec.europa.eu/codelist/ShoreTypeValue) and this has worked as well.

external-URI-child

Thank you again!

emanuelaepure10 commented 3 years ago

Dear Antonio,

I think that you have used wrongy the filed "collection", instead to use the filed "parent". The collection is an item beeing part of the reg_itemclass_parent of the itemclass that you are checking, such as your case the itemclass "codelist-value" has as collection the itemclass "codelist-item". ISO19135 says that a parent should be an item in the same itemclass, such as your case a parent of an itemclass "codelist-value" should have the same itemclass. To correct the problem that you have would be enogh if you would create a parent filed for the itemclass "codelist-value" and change the import bulk file accordingly. Just download the template after adding the filed, probably will be enought to substitute collection with parent.

I will take you step by step throw the creation of a registry and the related itemclasses and then please let us know if you skipped any of this steps.

Step 1. Go the the registry page and click the button "Add register" image

Step 2. Add the register details and "Save" image 2.1. Now you will see the newly created register in the "New item proposed" table image 2.2. Go to menu tab "Registry Manager" and "publish" the action containing the newly create register image 2.3. Go to menu tab "Structure" image In this moment you can only add the register items (for example Codelist using the INSPIRE Registry terms) 2.4. You can add the register items using the interface by clicking the "Add Item" or by doing the "Bulk import". image Bellow see the addition of an item by using the interface. image than click save and follow the intire workflow to publish the item. 2.5. Now lets say that I want to create another item but I want that this item has as a parent the item I have created before. What I do is going to the tab structure, and to the reg_itemclass "italiaregister-item" and I will add a field "parent". As ISO13195, we know that a parent needs to have the same item class as the child item. image 2.6. and Just "add field" choosing from the dropdown list the parent filed to associate it to the itemclas "italiaregister-item" image Now you can go back to the strucure of this itemclass and choose if you want to show it in the table or not. I will tick the box to show it in the table. Now by the interface I will create another item just to be later able to have a bulk import of items with more than one parent. To this item I'm already able to pick a parent from the interface. Now I will publish the item just to be able later to use it in the bulk file. For the bulk import you can just use a parent which is already a valid item in the system. 2.7. In this moment I have image now I will download the template file and fill it with some content image I have added 3 items with parent codelist1 and 2 items with parent codelist2 (both parent valid items in the system). After loading the file and running the bulk import I get this page image Now you just need to follow the workflow to publish the proposed items.

Step 3. Now go back to the structure and add a child item class for "italiaregister-item" image 3.1. Add a local ID for the item class and save the change image 3.2. Now knowing already that you will need a parent filed please go in "Structure" and view the "italiaregister-value" item class image and add a field parent to it, as we did for the "italiaregister-item". image After adding the filed the page should ook like this image 3.3. Now I will follow exactly the same process as for the "italiacodelis-item" item class. I will add 2 items from the interface, I will publish them and later I will do a bulk import with more items and I will use the 2 parents valid published before. The file used for the bulk import image and the page of codelist1 look like this after the import of the bulk import image

Please let us know if you manage to import the items.

Best regards, Emanuela

AntoRot commented 3 years ago

Dear @emanuelaepure10,

Maybe I described not clearly what I need to do.

I didn't use the field "parent" because I didn't want to link items in the same itemclass, but I needed to link items belonging to an itemclass to items belonging to another itemclass. For this reason I used the field "collection".

For instance, the first itemclass (i.e. "codelist-item") includes the following items:

          LocalID          | Language | label       |       definition                           
---------------------------+----------+-------------+----------------------------------
  04010103                 |  it      | label CL1   | definition CL1                               
  04010101                 |  it      | label CL2   | definition CL2  
  04010102                 |  it      | label CL3   | definition CL3  
  04010201                 |  it      | label CL4   | definition CL4  

where no items shall have a parent in the same itemclass.

The second itemclass (i.e. "codelist-value"), child of the first one, should include the following items:

  LocalID    | Language |  collection |     label        |   definition          
-------------+----------+-------------+------------------+-------------------
01           |  it      | 04010103    |   label CLV11    | definition CLV11            
02           |  it      | 04010103    |   label CLV12    | definition CLV12
03           |  it      | 04010101    |   label CLV21    | definition CLV21
04           |  it      | 04010101    |   label CLV22    | definition CLV22
05           |  it      | 04010101    |   label CLV23    | definition CLV23
06           |  it      | 04010102    |   label CLV31    | definition CLV31
07           |  it      | 04010201    |   label CLV41    | definition CLV41
08           |  it      | 04010201    |   label CLV42    | definition CLV42

where the items are related to the items included in the parent itemclass "codelist-item", through a collection hierarchy relation.

The problem is the following:

childitemclass

templatecsv

But the header or this file didn't include the field "collection" despite I added it, i.e. the header of the template file was as follows

LocalID|Language|label|definition|dbgt-id-value

Consequently I could only add the items of the itemclass "codelist-value" which were related to the item selected of the itemclass "codelist-item". This has worked, as you can see

childitems

But I couldn't add all the items of the itemclass "codelist-value" through only one bulk import operation. This implies that I should make 107 different bulk import operations, i.e. one for each item of the parent itemclass "codelist-item", as the items in that itemclass are 107. I tried to make a bulk import with a CSV file with the following header

LocalID|Language|collection|label|definition|dbgt-id-value

including all the items of the itemclass "codelist-value", but the bulk import didn't work, as:

Thank you very much for your support!

emanuelaepure10 commented 3 years ago

Dear Antonio,

Now I see it diferently :-)

The collection is actually the container of certain items. This collection is added automatically by the system when for example you have added some codelist values bellow a codelist. You will see in the table reg_relation a line for each codelist value and the related collection codelist.

So for your case, if you want just to have a reference from the codelist-values to a another itemclass you can create a field of type relationReference image pointing exactly to the itemclass you want image

Downloading the template for the items I got image in my case as I choose to have a reference to applicationschema I dont need to feel the column Collectioncodelist-value-reference, but if you choose to point to a "codelist-value" itemclass for example this one has a collection as well and you need to add it in the template file. image After the bulk import has finished I can see the 2 new item proposed having the link to the reference application schema that I choose for the field. image

In the INSPIRE Registry we have such an example for:

  1. relationReference theme and application scheme for the codelist value: https://inspire.ec.europa.eu/codelist/AccessRestrictionValue/forbiddenLegally
  2. relationReference document for codelist: https://inspire.ec.europa.eu/codelist/Article17CountingUnitValue Please let us know if that help you
AntoRot commented 3 years ago

Dear @emanuelaepure10,

Unfortunately this doesn't help me because I don't need to add a reference.

I try to explain better using your example. You added the two items of the itemclass "codelist-value" having localID "codelistvalue7" and "codelistvalue8" to the item "codelist1" of the parent itemclass by making a bulk import directly by the page of the "codelist1". If you want to add other two items, for instance "codelistvalue9" and "codelistvalue10" to the item "codelist2", you should access to the page of the item "codelist2" and make a new bulk import. Again, if you want to add other three items, for instance "codelistvalue11", "codelistvalue12" and "codelistvalue13" to the item "codelist3", you should access to the page of the item "codelist3" and make a new bulk import. And so on.

What I meant and would need is to avoid to make n individual bulk imports (one for each item of the parent itemclass) and, instead, make one only bulk import that let me to add all the items of the itemclass "codelist-value" (i.e. "codelistvalue7", "codelistvalue8", "codelistvalue9", "codelistvalue10", "codelistvalue11", "codelistvalue12" and "codelistvalue13") properly related to the relevant items of the parent itemclass.

I hope it is clear now :)

Thank you!

emanuelaepure10 commented 3 years ago

Hi,

This is a very nice feature, but we dont have it in our future plans. For now we are just having the bulk import at the level of a single item.

Please feel free to develop it if you can/want and share it with the community.

Best regards, And thank you for everything

AntoRot commented 3 years ago

OK, Thank you!

If we'll develop the feature we'll share it for sure.

Best regards, Antonio

emanuelaepure10 commented 3 years ago

Thank you Antonio.

We will look forward to have the feature implemented.

Best regards, Emanuela