Closed vpeil closed 10 years ago
What if the generator returns an array reference? I would expect this to be interpreted as a list of multiple items.
Well, same as for Stores that return a result set from a database? You construct in memory a stack of records and return them one by one
sub {
state @stack = parseResults;
pop @stack;
}
Or you can use a pull xml parser and use its state to fetch the next record
Please have a look at Catmandu::XML instead of implementing XML parsing in each new importer. Catmandu::XML uses a pull parser and already supports cutting one XML stream into multiple records:
catmandu convert XML --path record < collection.xml
{"record":"{data of one bibliographic record no.1}"}
{"record":"{data of one bibliographic record no.2}"}
{"record":"{data of one bibliographic record no.3}"}
You can use a Catmandu::Importer::XML that is fed each collection and returns the records one by one. You could also use XML::Struct as Catmandu::Importer::XML is just a thin layer on top, but with Catmandu::Importer::XML you'll get new features of Catmandu::XML, such optional XSLT processing for free .
thank you, guys. This one works pretty fine: I use Catmandu::Importer::XML as suggested by @nichtich. Then ->to_array and
sub {
state @stack = parseResults;
pop @stack;
}
as suggested by @phochste. See for example: https://github.com/LibreCat/Catmandu-Inspire/commit/69385213a673537906775a1df811dcce6bf72c86
I'll update the other importers is the same manner soon.
While updating/creating some importers I faced some problems. Here's an example: Catmandu::Importer::Inspire returns exactly one item, which is just the whole xml structure:
In this case, $importer->count is 1. But I want it to be 3, of course. And the Fixes are more complicated if the path start with "collection" always. The same problem is with the importers ArXiv, CrossRef, EuropePMC...
How should the sub generator {} look like?
Any suggestions?