Closed powerriegel closed 6 years ago
I can reproduce your problem, but you are using PICA::Record, an outdated/deprecated module, which is not supported by Catmandu. Please submit bug reports to https://github.com/gbv/PICA-Record. I recommend to use PICA::Data (CPAN, Github). It is actively maintained, developed and supports several PICA formats. A simple example for your use case:
#!/usr/bin/env perl
use strict;
use utf8;
use warnings;
use PICA::Writer::Plain;
use PICA::Writer::Plus;
use PICA::Writer::Binary;
my $record = {
_id => '123',
# a PICA record is an array of arrays
# each PICA field array consists of a field tag, an occurrence
# and a sequence of subfield indicators and subfield values
record => [
[ '001U', '', '0', 'utf8' ],
[ '021A', '', 'a', 'Foo', 'd', 'Bar' ]
]
};
my $writer_binary = PICA::Writer::Binary->new('out_binary.pica');
$writer_binary->write($record);
$writer_binary->write($record);
$writer_binary->write($record);
my $writer_plain = PICA::Writer::Plain->new('out_plain.pica');
$writer_plain->write($record);
$writer_plain->write($record);
$writer_plain->write($record);
my $writer_plus = PICA::Writer::Plus->new('out_plus.pica');
$writer_plus->write($record);
$writer_plus->write($record);
$writer_plus->write($record);
print "Pica files written\n";
Ok, I've tried it with your code example. out_plus.pica
looks like this:
`001U 0utf8021A aFoodBar
001U 0utf8021A aFoodBar
001U 0utf8021A aFoodBar` So, there are no set separators and no field separators.
The Binary file might be accepted by CBS but it's not human readable. Isn't there a way to add those fields in the plus format?
out_plus.pica
contains unit separators (0x1F) and record separators (0x1E) (you can see this in the edit mode or with an hex viewer), the line feed is used as group separator. The plus
and binary
format are not designed for human readability, use plain
for this. I will check if we could implement a generic writer, where everyone can set his own separators.
Hello, I'm using your Perl module to convert Marc21 files into Pica+Files. We use the normalized format as this is directly supported by our library system.
Produces the output:
These rectangles between the records are \1xD (record separator) chars. CBS (library system) has problems if there are two \x1D and we need to remove one of them.