LibreCat / Catmandu

Catmandu - a data processing toolkit
https://librecat.org
175 stars 31 forks source link

Working with a list/array of objects #378

Closed TobiasNx closed 3 years ago

TobiasNx commented 3 years ago

I am learning Catmandu with the help of the documentation and the tutorial. So far they are very helpful. Thanks.

One thing I have difficulties with is how to work with a array of objects. This seems not to be discussed in the documentation or in the repo. I only can find examples of lists of strings. I am no programmer but familiar with XML and JSON and nested data structures. In context of catmandu arrays seem to be called lists and objects seem to be called hashes.

e.g. in following part of a transformed MODS-record I harvested via: $ catmandu convert OAI --url https://duepublico2.uni-due.de/oer/oai --metadataPrefix mods --handler mods to YAML

---
_metadata:
  mods:
    identifier:
    - _body: duepublico_mods_00071610
    - _body: https://duepublico2.uni-due.de/receive/duepublico_mods_00071610
    location:
    - url:
      - _body: https://duepublico2.uni-due.de/receive/duepublico_mods_00071096
        access: object in context
      - _body: https://duepublico2.uni-due.de/servlets/MCRZipServlet/duepublico_mods_00071096
        access: raw object
      - _body: https://duepublico2.uni-due.de/rsc/thumbnail/duepublico_mods_00071096.png
        access: preview
      - _body: https://bridge.nrw
...

In here I find the information I need for the a new field id as well as the information for the field image:

This should be transformed into: {"id": "https://duepublico2.uni-due.de/receive/duepublico_mods_00071610", "image": "https://duepublico2.uni-due.de/rsc/thumbnail/duepublico_mods_00071096.png"}

While I am able to rename the field names of the value I want if a regexp matches I am not able to move/copy the field out of the list-object into the root level and rename it to id or image.

do list(path:_metadata.mods.identifier)
    if all_match(_body, "^https://duepublico2.uni-due.de/receive/.*")
        move_field(_body, id)
    end
end

do list(path:_metadata.mods.location)
    do list(path: url)
        if all_match(_body, "^https://duepublico2.uni-due.de/rsc/thumbnail/.*")
            move_field(_body, image)
        end
    end
end

Is there something I am missing in the cheat sheet. Could you help to move/copy the key/value-pairs that matches the regexp out of the object of the array.

jorol commented 3 years ago

This should work:

Fix mods.fix:

# use a temporary variable 
do list(path:_metadata.mods.identifier,var:a)
    if all_match(a._body,"^https://duepublico2.uni-due.de/receive/.*")
        copy_field(a._body,id)
    end
end

# copy all image URLs to a temporary array 
copy_field(_metadata.mods.location.*.url.*._body,tmp.$append)

# iterate through the temporary array
do list(path:tmp,var:b)
     if all_match(b,"^https://duepublico2.uni-due.de/rsc/thumbnail/.*")
        copy_field(b,image)
    end
end

# remove the temporary array
remove_field(tmp)

# keep just the two new fields
retain(id,image)

Run command:

$ catmandu convert OAI --url https://duepublico2.uni-due.de/oer/oai --metadataPrefix mods --handler mods to YAML --fix mods.fix
TobiasNx commented 3 years ago

Thanks it does.