LibreCat / Catmandu-MARC

Catmandu modules for working with MARC data
https://metacpan.org/release/Catmandu-MARC
Other
8 stars 10 forks source link

Parser "RAW" and fix "marc_map": got field content twice #12

Closed jorol closed 9 years ago

jorol commented 9 years ago

Versions:

$ cpanm Catmandu::MARC
Catmandu::MARC is up to date. (0.206)
$ cpanm Catmandu
Catmandu is up to date. (0.9209)

CLI

$ catmandu -I ./lib convert MARC --type RAW --fix 'marc_map("001","id")' to CSV --fields id < ./t/camel.usmarc
id
"fol05731351 fol05731351 "
"fol05754809 fol05754809 "
"fol05843555 fol05843555 "
"fol05843579 fol05843579 "
"fol05848297 fol05848297 "
"fol05865950 fol05865950 "
"fol05865956 fol05865956 "
"fol05865967 fol05865967 "
"fol05872355 fol05872355 "
"fol05882032 fol05882032 "

$ catmandu -I ./lib convert MARC --type RAW --fix 'marc_map("001","id");remove_field("record")' to CSV --fields id < ./t/camel.usmarc
id
"fol05731351 ",
"fol05754809 ",
"fol05843555 ",
"fol05843579 ",
"fol05848297 ",
"fol05865950 ",
"fol05865956 ",
"fol05865967 ",
"fol05872355 ",
"fol05882032 ",

Error

If I use the "RAW" parser (--type RAW) and just "marc_map" fixes (--fix 'marc_map("001","id")') I get the content twice. If I add "remove_field("record")" to the fix everything works fine. The same commands with the "USMARC" parser works also fine.

This left me wondering as both parsers create the same data structure:

$ catmandu -I ./lib convert MARC --type RAW to YAML < ./t/camel.usmarc > raw.yml
$ catmandu -I ./lib convert MARC --type USMARC to YAML < ./t/camel.usmarc > usmarc.yml
$ diff usmarc.yml raw.yml
$

I've added a test https://github.com/jorol/Catmandu-MARC/commit/d9c2d5498e2ecd843192423de38401e041f51723 which also works fine.

phochste commented 9 years ago

Weird bug. It has to do with the order of parameters. If you put the to CSV in front of the --fix it is okay. A fix for type RAW seems to be executed by the importer and exporter

$ catmandu -I ./lib convert MARC --type RAW to CSV --fix 'marc_map("001","id")'  --fields id <    ./t/camel.usmarc
id
"fol05731351 ",
"fol05754809 ",
"fol05843555 ",
"fol05843579 ",
"fol05848297 ",
"fol05865950 ",
"fol05865956 ",
"fol05865967 ",
"fol05872355 ",
"fol05882032 ",
jorol commented 9 years ago

I still got problems with certain combinations of CLI arguments and options:

$ catmandu convert MARC --type XML --fix "marc_map(245,title);remove_field(record)" < marc.xml
{"_id":"991042727010838","record":[["LDR",null,null,"_","00000nas-a2200000z--4500"],["920"," "," ","a","periodical"], ... ]}

Error: Fix is not executed. Works if I specify an output format and add the fix to it:

$ catmandu convert MARC --type XML to JSON --fix "marc_map(245,title);remove_field(record)" < marc.xml
{"_id":"991042727010838","title":"TEEM (한국전기전자재료학회)"}

The error doesn't occur with other --types like USMARC & RAW:

$ catmandu convert MARC --type USMARC --fix "marc_map(245,title);remove_field(record)" < camel.usmarc 
{"title":"ActivePerl with ASP and ADO /Tobias Martinsson.","_id":"fol05731351 "}
...
$ catmandu convert MARC --type RAW --fix "marc_map(245,title);remove_field(record)" < camel.usmarc 
{"_id":"fol05731351 ","title":"ActivePerl with ASP and ADO /Tobias Martinsson."}
...