DOREMUS-ANR / marc2rdf

Converter from UNIMARC/INTERMARC to RDF using the DOREMUS model
Apache License 2.0
6 stars 0 forks source link

InterMARC with 2 <record>: are they 2 distinct Work? #6

Closed pasqLisena closed 8 years ago

pasqLisena commented 8 years ago

There are some InterMARC form BNF with nested record tags at different level (see https://www.dropbox.com/s/czjdndjlhdc72it/39578699.xml?dl=0 and search for 'record').

There is some code that is not working properly. I will report in this issue all problems related to it.

manoach commented 8 years ago

Hi @pasqLisena,

I didn't understand, what's wrong with the retrievement of the bibliographic agency ?

pasqLisena commented 8 years ago

I found it by printing the variable agence when parsing the file i linked in the first comment.

what it was

The program correctly found the value 'FRBNF', but twice. The reason is that the value is specified for both the main record and the child one (search in the document the 2 tag="001"). So the second value was appended to the first one, resulting in FRBNFFRBNF and failing the if at line 81. The application crashed because builder \ uri \ return value was null.

You can replicate it: comment line 77 (break lineLoop;) and print agence after all loops.

my workaround

After the first agence is found, I stop every loop. Critical point: in this way the second data is lost.

what we should do

We need to separate the two records in some way. 😄

Moreover, the line 81 if checks for equality to 'FRBNF': there are no other values allowed? (the following return null is very dangerous).

manoach commented 8 years ago

Oh OK I understand now and I agree with you :) .... I have not noticed this before. Now, we have to see with @pierrechoffe : do we consider these two records as two works? if that's the case, I suggest that we separate these two records and we convert them independently ...

What do you think ?

pierrechoffe commented 8 years ago

@manowb hi, this is a tricky one. First, this record is a bibliographic record, not an authority record. It describes a work by an Italian composer that was copied by a French composer. The same document also contains the copy of another work, therefore there are 2 records. So yes, these two records are definitely two different works. If you want more information, and also if you need to know whether this kind of situation happens in various modes, the best is to ask Frédéric directly, he is the one who knows best.

pasqLisena commented 8 years ago

these two records are definitely two different works

Separate and convert independently seems the best option.

manoach commented 8 years ago

Hi @pasqLisena,

Yes, to do it, I suggest, before starting the conversion, to parse the file and then to convert each record separately. I can do it ... Just let me know ^^

rtroncy commented 8 years ago

I just assigned it to you @manowb