DOREMUS-ANR / marc2rdf

Converter from UNIMARC/INTERMARC to RDF using the DOREMUS model
Apache License 2.0
6 stars 0 forks source link

[BnF] Incorrect parsing of dates #47

Closed pasqLisena closed 7 years ago

pasqLisena commented 7 years ago

I noticed some error in parsing of the dates, that are shown in the first results of this query:

SELECT DISTINCT ?source, ?expCreation, ?compositionDate, SAMPLE(?title) as ?title
WHERE {
  ?expression a efrbroo:F22_Self-Contained_Expression ;
         dct:publisher ?source ;
          mus:U70_has_title ?title .
  ?expCreation efrbroo:R17_created ?expression ;
          ecrm:P4_has_time-span / ecrm:P79_beginning_is_qualified_by ?compositionDate .

} ORDER BY ?compositionDate
LIMIT 300

I divide this issue in cases:

Case 1 : "21 a"

Creation TimeSpan of a Mozart's sonata: http://data.doremus.org/event/3a2122a9-142e-3cdb-800c-90f3440c9a28/time

In this example we found "21 a" (that it seems to stand for 21st of April) in the MARC-XML file that we have:

<controlfield tag="008">
000121120825yy sn 21 a 17840421 1
<Pos Code="0001">00</Pos>
<Pos Code="0203">01</Pos>
<Pos Code="0405">21</Pos>
<Pos Code="0607">12</Pos>
<Pos Code="0809">08</Pos>
<Pos Code="1011">25</Pos>
<Pos Code="1213" Sens="ne s'applique pas">yy</Pos>
<Pos Code="1820">sn</Pos>
<Pos Code="2831">21 a</Pos>
<Pos Code="3841">1784</Pos>
<Pos Code="4243">04</Pos>
<Pos Code="4445">21</Pos>
<Pos Code="61" Sens="La vedette peut être liée à une notice bibliographique sauf pour l'accès matière RAMEAU">1</Pos>
</controlfield>

but I do not see it the INTERMARC on the BnF website:

008 000121161124yy sn 17840421 1

We need a new export of the source data?

Case 2 : "194-"

http://data.doremus.org/event/64f3f07e-7563-385a-a409-540953aca30a/time

I think that it means that is an unknown date between 1940 an 1949.

This example from PP falls in the same case: http://data.doremus.org/event/b8cce087-eac8-3cdc-9ae2-e986749dfe8f/time

Case 3: "18-8"

Creation time of "Esquisses d'avant-guerre": http://data.doremus.org/event/33a1c95a-079a-35f6-b9d5-e92c36d52a92/time

It seems that the xml is missing the year (full date 18-08-1939

<controlfield tag="008">
980115980115yy uu 18-8 1
<Pos Code="0001">98</Pos>
<Pos Code="0203">01</Pos>
<Pos Code="0405">15</Pos>
<Pos Code="0607">98</Pos>
<Pos Code="0809">01</Pos>
<Pos Code="1011">15</Pos>
<Pos Code="1213" Sens="ne s'applique pas">yy</Pos>
<Pos Code="1820">uu</Pos>
<Pos Code="2831">18-8</Pos>
<Pos Code="61" Sens="La vedette peut être liée à une notice bibliographique sauf pour l'accès matière RAMEAU">1</Pos>
</controlfield>

Also the INTEMARC online shows the same problem. How to solve it?

pasqLisena commented 7 years ago

Case 2 : "194-"

It is confirmed that is an unknown date between 1940 an 1949.

pasqLisena commented 7 years ago

A data correction in the sources has been requested for this MARC files:

Case 1 : "21 a"

Case 3: "18-8"

Their solution will no more concern marc2rdf