LibreCat / Catmandu-MARC

Catmandu modules for working with MARC data
https://metacpan.org/release/Catmandu-MARC
Other
8 stars 10 forks source link

Anything for CPAN Pull Request Challenge? #17

Closed zoffixznet closed 8 years ago

zoffixznet commented 8 years ago

Hi,

I've been assigned Catmandu::MARC as part of the CPAN Pull Request Challenge for October.

Are there any issues you'd like me to look at in particular?

Let me know, ZZ

phochste commented 8 years ago

Hi

When you install Catmandu you can run this command in the Catmandu-MARC development directory:

$ catmandu -I lib convert MARC to JSON < t/camel.usmarc

This will show you line by line a JSON record for reach MARC record on camel.usmarc. What would be cool is to have a new Exporter to show some field statistics. To do this one needs to first create a module Catmandu::Exporter::MARC::FieldList like:

package Catmandu::Exporter::MARC::FieldList;
use Catmandu::Sane;
use Moo;

with 'Catmandu::Exporter', 'Catmandu::Exporter::MARC::Base';

sub add {
    my ($self, $data) = @_;

    $self->fh->print("Blablabla\n");
}

sub commit {
    my ($self) = @_;
    $self->fh->flush;
}

1;

You can test this module with this command:

$ catmandu -I lib convert MARC to MARC --type FieldList < t/camel.usmarc

Now what we need is a listing of all tags and fields available in the MARC record. As you can see in the JSON output, a MARC record is an ARRAY of an ARRAY of fields in the 'record' key.

record =>  [
  ["005",null,null,"_","20000706095105.0"] ,
  ["008",null,null,"_","000315s1999    njua          001 0 eng  "],
  ["010"," "," ","a","   00500678 "],
  ["020"," "," ","a","013020868X"],
  ["260"," "," ","a","Upper Saddle River, NJ :","b","Prentice Hall PTP,","c","c1999."]
  ...
]

The field above have a syntax:

 [ FIELD_NAME , IND1, IND2 , CODE, VALUE , CODE, VALUE , CODE , VALUE , etc]

For example in the example above we have the field:

FIELD_NAME : 005  IND1:null IND2: null CODE:_ VALUE: 20000706095105.0

What we need is a statistic, the running total over all records and counting the number of FIELD_NAME+CODE combinations. E.g for the record above:

 005_ : 1
 008_ : 1
 010a : 1
 020a : 1
 260a : 1
 260b : 1
 260c : 1

Probably you'll count in the 'add()' module the statistics from the input $data and in the 'commit()' you'll print the result.

Don't work on this more than a few hours :)

Thanks! Patrick