LibreCat / Catmandu-MARC

Catmandu modules for working with MARC data
https://metacpan.org/release/Catmandu-MARC
Other
8 stars 10 forks source link

closing collection tag is missing #105

Closed davewood closed 3 years ago

davewood commented 3 years ago

im converting a huge XMLMARC file and while the input file has both an opening and closing tag the output only has the opening tag.

catmandu convert  MARC --type XML to MARC --type XML < test.marc.xml > test.fixed.marc.xml

input

<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns="http://www.loc.gov/MARC21/slim"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.loc.gov/MARC21/slim
http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">
<record>
...
</record>
</collection>

output

<?xml version="1.0" encoding="UTF-8"?>
<marc:collection xmlns:marc="http://www.loc.gov/MARC21/slim">
<marc:record>
...
</marc:record>
jorol commented 3 years ago

I can't reproduce this on my machine with same versions of modules. Could you please check the output of this conversion?

$ curl -s 'https://raw.githubusercontent.com/LibreCat/Catmandu-MARC/dev/t/marc.xml' | catmandu convert MARC --type XML to MARC --type XML

You can try to force the collection tags with option --collection 1:

$ catmandu convert  MARC --type XML to MARC --type XML --collection 1 < test.marc.xml > test.fixed.marc.xml

See https://metacpan.org/pod/Catmandu::Exporter::MARC::XML

davewood commented 3 years ago

I just shortened the import file from from 142k records to 200 records and now I do see the closing collection tag.

but with the large file it is still missing and catmandu doesnt complain or throw an error.

Something with the input file causes catmandu to silently create an invalid output file. What could be the reason for that behaviour?

jorol commented 3 years ago

Could you please validate your XML import file?

$ xmllint --noout --schema http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd test.marc.xml
$ yaz-marcdump -n -i marcxml test.marc.xml
davewood commented 3 years ago

there was an error in the input file

xmllint --noout --schema http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd test.xml
test.xml:127779: element datafield: Schemas validity error : Element '{http://www.loc.gov/MARC21/slim}datafield': Missing child element(s). Expected is ( {http://www.loc.gov/MARC21/slim}subfield ).
test.xml fails to validate
<datafield tag="245" ind1="0" ind2="0">^M<!-- Feld3500 -->  </datafield>

after removing this line and re-running catmandu convert the closing tag is there.

thanks for the help!