LibreCat / Catmandu-OAI

Catmandu modules for working with OAI repositories
https://metacpan.org/release/Catmandu-OAI
3 stars 2 forks source link

OAI to CSV Using medataPrefix Exposing Bitsteam URL? #25

Closed lightonphiri closed 6 years ago

lightonphiri commented 6 years ago

I am attempting to via OAI-PMH using metadata formats that have elements identifying the actual bitsteam: e.g. link to PDF document. The metadataPrefixes 'ore' [1] and 'didl' [2] have the information I want, however, I cannot seem to extract the values when I specify element tags with the data.

For instance if I specify that I wish to extract all 'atom:link' tags using the code below, I do not get any output.

catmandu convert OAI --url http://demo.dspace.org/oai/request to CSV --fix 'join_field(link,"|")' --metadataPrefix ore --fields link

I am specifically interested in get the rel value with the actual bitsteam: see element below.

`:

:`

[1] http://demo.dspace.org/oai/request?verb=ListRecords&metadataPrefix=ore
[2] http://demo.dspace.org/oai/request?verb=ListRecords&metadataPrefix=didl

phochste commented 6 years ago

There is currrently not a ORE or DIDL handler available for Catmandu::OAI, which makes it hard to get specific fields out of the ORE output.

It worked for me with this command

catmandu convert OAI --url http://demo.dspace.org/oai/request --metadataPrefix ore --handler raw to CSV --fix myfix.fix

with myfix.fix like:

retain(_metadata)
xml_simple(_metadata)

do list(path:"_metadata/atom:link",var:x)
  if all_match(x.rel,"aggregates")
    copy_field(x.href,myurl)
  end
end

retain(myurl)

This gives as output:

yurl
http://demo.dspace.org/xmlui/bitstream/10673/43/1/Volleybal-Magazine-1988-10-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/45/1/Volleybal-Magazine-1989-02-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/7/2/test_word.docx
http://demo.dspace.org/xmlui/bitstream/10673/48/2/content.zip
http://demo.dspace.org/xmlui/bitstream/10673/29/1/Volleybal-Magazine-1991-08-Rik-Luyten-Jeugdkamp.pdf
http://demo.dspace.org/xmlui/bitstream/10673/35/1/Volleybal-Magazine-1992-01-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/42/1/Volleybal-Magazine-1988-09-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/49/2/content.zip
http://demo.dspace.org/xmlui/bitstream/10673/30/1/Volleybal-Magazine-1991-08-Rik-Luyten-Kohler.pdf
http://demo.dspace.org/xmlui/bitstream/10673/22/1/Volleybal-Magazine-1990-05-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/12/1/Volleybal-Magazine-1987-03-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/6/2/test_ppt.pptx
http://demo.dspace.org/xmlui/bitstream/10673/25/1/Volleybal-Magazine-1990-12-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/41/1/Volleybal-Magazine-1988-08-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/21/1/Volleybal-Magazine-1990-03-Rik-Luyten.pdf

http://demo.dspace.org/xmlui/bitstream/10673/23/1/Volleybal-Magazine-1987-04-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/39/1/Volleybal-Magazine-1992-05-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/26/1/Volleybal-Magazine-1991-01-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/17/1/Volleybal-Magazine-1989-09-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/20/1/Volleybal-Magazine-1990-01-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/16/1/Volleybal-Magazine-1989-08-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/3/2/test_excel.xlsx
http://demo.dspace.org/xmlui/bitstream/10673/40/1/Volleybal-Magazine-1987-12-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/37/1/Volleybal-Magazine-1992-03-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/18/1/Volleybal-Magazine-1989-10-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/38/1/Volleybal-Magazine-1992-04-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/24/1/Volleybal-Magazine-1990-10-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/4/7/stylesheet.css
http://demo.dspace.org/xmlui/bitstream/10673/28/1/Volleybal-Magazine-1991-04-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/19/1/Volleybal-Magazine-1989-12-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/44/1/Volleybal-Magazine-1989-01-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/13/1/Volleybal-Magazine-1989-03-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/32/1/Volleybal-Magazine-1991-11-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/33/1/Volleybal-Magazine-1991-12-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/15/1/Volleybal-Magazine-1989-05-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/5/2/test_pdf.pdf
http://demo.dspace.org/xmlui/bitstream/10673/36/1/Volleybal-Magazine-1992-02-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/14/1/Volleybal-Magazine-1989-04-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/31/1/Volleybal-Magazine-1991-09-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/27/1/Volleybal-Magazine-1991-02-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/34/1/Volleybal-Magazine-1987-05-Rik-Luyten.pdf
http://demo.dspace.org/xmlui/bitstream/10673/46/1/Vollebal-Magazine-SAF-full.zip
lightonphiri commented 6 years ago

Thanks. I was not aware ORE and DIDL were not implemented. Your workaround does exactly what I want: thank you so much for this.