EUDAT-B2SHARE / b2share

B2SHARE software for the EUDAT CDI services.
https://b2share.eudat.eu
GNU General Public License v2.0
35 stars 32 forks source link

Empty OAI-PMH record metadata exported by b2share v.1.6.2 for marcxml #812

Closed emanueldima closed 8 years ago

emanueldima commented 8 years ago

This report comes from Heinrich Widmann (B2FIND):

Try e.g. https://b2share.eudat.eu/oai2d?verb=ListRecords&metadataPrefix=marcxml. It seems to work in Browser, but via harvesting by OAI iterator in Python I get error mesages 'HTTP 503', see as well the log file attached.

For set=Linguistics the getRecord-Request runs in HTTP errors and finally in IndexError: list index out of range. E.g. https://b2share.eudat.eu/oai2d?verb=ListRecords&metadataPrefix=marcxml&set=Linguistics leads as well submited in a browser to empty records ...


Version:    2.0
Run mode:       Harvesting
Start loop over processes and related requests in the job list:     2015-12-09 10:47:41
|- <Process> started : <Time>
 |- Joblist: <Filename of request list>
   |# <No or Request> : <Request description>          
    |- <Status>   |@ <Time>     |

|- Harvesting started : 2015-12-09 10:47:41
 |- Joblist:    harvest_list
   |# 1    : ['b2share', 'http://b2share.eudat.eu/oai2d', 'ListIdentifiers', 'marcxml', 'Linguistics'] 
    |- Started    |@ 10:47:41   |
HTTP 503! Retrying after 2 seconds...
HTTP 503! Retrying after 2 seconds...
    [ERROR] Traceback (most recent call last):
  File "/home/k204019/Projects/EUDAT/EUDAT-B2FIND/md-ingestion/B2FIND.py", line 521, in harvest
    record = sickle.GetRecord(**{'metadataPrefix':req['mdprefix'],'identifier':record.identifier})
  File "/usr/local/lib/python2.7/dist-packages/sickle/app.py", line 164, in GetRecord
    record = self.iterator(self, params).next()
  File "/usr/local/lib/python2.7/dist-packages/sickle/iterator.py", line 146, in next
    mapped = self.mapper(item)
  File "/usr/local/lib/python2.7/dist-packages/sickle/models.py", line 140, in __init__
    ).getchildren()[0], strip_ns=self._strip_ns)
IndexError: list index out of range

   |# 2    : ['b2share', 'http://b2share.eudat.eu/oai2d', 'ListIdentifiers', 'marcxml', 'EUON'] 
    |- Started    |@ 10:47:46   |
HTTP 503! Retrying after 2 seconds...
HTTP 503! Retrying after 2 seconds...
HTTP 503! Retrying after 2 seconds...
    [ERROR] Traceback (most recent call last):
  File "/home/k204019/Projects/EUDAT/EUDAT-B2FIND/md-ingestion/B2FIND.py", line 521, in harvest
    record = sickle.GetRecord(**{'metadataPrefix':req['mdprefix'],'identifier':record.identifier})
  File "/usr/local/lib/python2.7/dist-packages/sickle/app.py", line 164, in GetRecord
    record = self.iterator(self, params).next()
  File "/usr/local/lib/python2.7/dist-packages/sickle/iterator.py", line 146, in next
    mapped = self.mapper(item)
  File "/usr/local/lib/python2.7/dist-packages/sickle/models.py", line 140, in __init__
    ).getchildren()[0], strip_ns=self._strip_ns)
IndexError: list index out of range

   |# 3    : ['b2share', 'http://b2share.eudat.eu/oai2d', 'ListIdentifiers', 'marcxml', 'Eudat'] 
    |- Started    |@ 10:47:54   |
HTTP 503! Retrying after 2 seconds...
HTTP 503! Retrying after 2 seconds...
HTTP 503! Retrying after 2 seconds...
HTTP 503! Retrying after 1 seconds...
    [ERROR] Traceback (most recent call last):
  File "/home/k204019/Projects/EUDAT/EUDAT-B2FIND/md-ingestion/B2FIND.py", line 521, in harvest
    record = sickle.GetRecord(**{'metadataPrefix':req['mdprefix'],'identifier':record.identifier})
  File "/usr/local/lib/python2.7/dist-packages/sickle/app.py", line 164, in GetRecord
    record = self.iterator(self, params).next()
  File "/usr/local/lib/python2.7/dist-packages/sickle/iterator.py", line 146, in next
    mapped = self.mapper(item)
  File "/usr/local/lib/python2.7/dist-packages/sickle/models.py", line 140, in __init__
    ).getchildren()[0], strip_ns=self._strip_ns)
IndexError: list index out of range

   |# 4    : ['b2share', 'http://b2share.eudat.eu/oai2d', 'ListIdentifiers', 'marcxml', 'DRIHM'] 
    |- Started    |@ 10:48:03   |
HTTP 503! Retrying after 2 seconds...
HTTP 503! Retrying after 2 seconds...
HTTP 503! Retrying after 2 seconds...
    [ERROR] Traceback (most recent call last):
  File "/home/k204019/Projects/EUDAT/EUDAT-B2FIND/md-ingestion/B2FIND.py", line 521, in harvest
    record = sickle.GetRecord(**{'metadataPrefix':req['mdprefix'],'identifier':record.identifier})
  File "/usr/local/lib/python2.7/dist-packages/sickle/app.py", line 164, in GetRecord
    record = self.iterator(self, params).next()
  File "/usr/local/lib/python2.7/dist-packages/sickle/iterator.py", line 146, in next
    mapped = self.mapper(item)
  File "/usr/local/lib/python2.7/dist-packages/sickle/models.py", line 140, in __init__
    ).getchildren()[0], strip_ns=self._strip_ns)
IndexError: list index out of range

   |# 5    : ['b2share', 'http://b2share.eudat.eu/oai2d', 'ListIdentifiers', 'marcxml', 'BBMRI'] 
    |- Started    |@ 10:48:10   |
HTTP 503! Retrying after 2 seconds...
HTTP 503! Retrying after 2 seconds...
HTTP 503! Retrying after 2 seconds...
    [ERROR] Traceback (most recent call last):
  File "/home/k204019/Projects/EUDAT/EUDAT-B2FIND/md-ingestion/B2FIND.py", line 521, in harvest
    record = sickle.GetRecord(**{'metadataPrefix':req['mdprefix'],'identifier':record.identifier})
  File "/usr/local/lib/python2.7/dist-packages/sickle/app.py", line 164, in GetRecord
    record = self.iterator(self, params).next()
  File "/usr/local/lib/python2.7/dist-packages/sickle/iterator.py", line 146, in next
    mapped = self.mapper(item)
  File "/usr/local/lib/python2.7/dist-packages/sickle/models.py", line 140, in __init__
    ).getchildren()[0], strip_ns=self._strip_ns)
IndexError: list index out of range

   |# 6    : ['b2share', 'http://b2share.eudat.eu/oai2d', 'ListIdentifiers', 'marcxml', 'NRM'] 
    |- Started    |@ 10:48:18   |
HTTP 503! Retrying after 2 seconds...
HTTP 503! Retrying after 2 seconds...
HTTP 503! Retrying after 2 seconds...
    [ERROR] Traceback (most recent call last):
  File "/home/k204019/Projects/EUDAT/EUDAT-B2FIND/md-ingestion/B2FIND.py", line 521, in harvest
    record = sickle.GetRecord(**{'metadataPrefix':req['mdprefix'],'identifier':record.identifier})
  File "/usr/local/lib/python2.7/dist-packages/sickle/app.py", line 164, in GetRecord
    record = self.iterator(self, params).next()
  File "/usr/local/lib/python2.7/dist-packages/sickle/iterator.py", line 146, in next
    mapped = self.mapper(item)
  File "/usr/local/lib/python2.7/dist-packages/sickle/models.py", line 140, in __init__
    ).getchildren()[0], strip_ns=self._strip_ns)
IndexError: list index out of range

   |# 7    : ['b2share', 'http://b2share.eudat.eu/oai2d', 'ListIdentifiers', 'marcxml', 'GBIF'] 
    |- Started    |@ 10:48:25   |
HTTP 503! Retrying after 2 seconds...
HTTP 503! Retrying after 2 seconds...
HTTP 503! Retrying after 2 seconds...
    [ERROR] Traceback (most recent call last):
  File "/home/k204019/Projects/EUDAT/EUDAT-B2FIND/md-ingestion/B2FIND.py", line 521, in harvest
    record = sickle.GetRecord(**{'metadataPrefix':req['mdprefix'],'identifier':record.identifier})
  File "/usr/local/lib/python2.7/dist-packages/sickle/app.py", line 164, in GetRecord
    record = self.iterator(self, params).next()
  File "/usr/local/lib/python2.7/dist-packages/sickle/iterator.py", line 146, in next
    mapped = self.mapper(item)
  File "/usr/local/lib/python2.7/dist-packages/sickle/models.py", line 140, in __init__
    ).getchildren()[0], strip_ns=self._strip_ns)
IndexError: list index out of range

   |# 8    : ['b2share', 'http://b2share.eudat.eu/oai2d', 'ListIdentifiers', 'marcxml', 'RDA'] 
    |- Started    |@ 10:48:33   |
HTTP 503! Retrying after 2 seconds...
HTTP 503! Retrying after 2 seconds...
HTTP 503! Retrying after 2 seconds...
HTTP 503! Retrying after 2 seconds...
HTTP 503! Retrying after 2 seconds...
HTTP 503! Retrying after 2 seconds...
HTTP 503! Retrying after 2 seconds...
HTTP 503! Retrying after 2 seconds...
HTTP 503! Retrying after 2 seconds...
HTTP 503! Retrying after 2 seconds...
HTTP 503! Retrying after 2 seconds...
HTTP 503! Retrying after 2 seconds...
HTTP 503! Retrying after 1 seconds...
HTTP 503! Retrying after 2 seconds...
HTTP 503! Retrying after 2 seconds...
    |- Finished   |@ 10:49:08   |
    | Provided | Harvested | Failed | Deleted |
    |       12 |        12 |      0 |      0 |

End :       2015-12-09 10:49:08
harvest_b2share.err
2 of 2 items
oai_b2find.xsdharvest_b2share.errDisplaying HannesThiemann-EUDAT-B2SHARE-Requirements.pdf.
llehtine commented 8 years ago

https://b2share.eudat.eu/oai2d?verb=ListRecords&metadataPrefix=marcxml&set=Linguistics shows also errors "Unknown metadata format" for some entries. And when loading page, this is what is coming to invenio log files:

2015-12-11 13:18:27,832 ERROR:  [in /var/www/.virtualenvs/b2share/lib/python2.7/site-packages
/invenio-2.0.7.dev20150901-py2.7.egg/invenio/ext/logging/wrappers.py:310]
Traceback (most recent call last):
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/ext/legacy/__init__.py", line 124, in __call__
    response = self.app.full_dispatch_request()
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask/app.py", line 1477, in 
full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask_restful/__init__.py", line 258, 
in error_router
    return original_handler(e)
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/base/wrappers.py", line 125, in handle_user_exception
    return super(Flask, self).handle_user_exception(e)
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask/app.py", line 1381, in 
handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask/app.py", line 1475, in 
full_dispatch_request
    rv = self.dispatch_request()
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask/app.py", line 1461, in 
dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/modules/records/views.py", line 155, in decorated
    return f(recid, *args, **kwargs)
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask_login.py", line 657, in 
decorated_view
    return current_app.login_manager.unauthorized()
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask_login.py", line 180, in 
unauthorized
    return self.unauthorized_callback()
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/ext/login/__init__.py", line 161, in do_login_first
    return login(referer=request.url), 401
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/base/decorators.py", line 203, in decorator
    return f(*args, **argd)
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/modules/accounts/views/accounts.py", line 75, in login
    action, arguments = mail_cookie_check_authorize_action(action)
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/modules/access/mailcookie.py", line 165, in mail_cookie_check_authorize_action
    (kind, params) = mail_cookie_check_common(cookie)
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/modules/access/mailcookie.py", line 106, in mail_cookie_check_common
    obj = AccMAILCOOKIE.get(cookie, delete=delete)
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/modules/access/models.py", line 109, in get
    cookie_id = int(cookie[16:-16], 16)
ValueError: invalid literal for int() with base 16: ''
2015-12-11 13:18:27,846 ERROR: Exception on /record/200/reviews/add [GET] [in /var/www
/.virtualenvs/b2share/lib/python2.7/site-packages/flask/app.py:1423]
Traceback (most recent call last):
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/ext/legacy/__init__.py", line 124, in __call__
    response = self.app.full_dispatch_request()
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask/app.py", line 1477, in 
full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask_restful/__init__.py", line 258, 
in error_router
    return original_handler(e)
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/base/wrappers.py", line 125, in handle_user_exception
    return super(Flask, self).handle_user_exception(e)
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask/app.py", line 1381, in 
handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask/app.py", line 1475, in 
full_dispatch_request
    rv = self.dispatch_request()
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask/app.py", line 1461, in 
dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/modules/records/views.py", line 155, in decorated
    return f(recid, *args, **kwargs)
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask_login.py", line 657, in 
decorated_view
    return current_app.login_manager.unauthorized()
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask_login.py", line 180, in 
unauthorized
    return self.unauthorized_callback()
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/ext/login/__init__.py", line 161, in do_login_first
    return login(referer=request.url), 401
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/base/decorators.py", line 203, in decorator
    return f(*args, **argd)
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/modules/accounts/views/accounts.py", line 75, in login
    action, arguments = mail_cookie_check_authorize_action(action)
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/modules/access/mailcookie.py", line 165, in mail_cookie_check_authorize_action
    (kind, params) = mail_cookie_check_common(cookie)
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/modules/access/mailcookie.py", line 106, in mail_cookie_check_common
    obj = AccMAILCOOKIE.get(cookie, delete=delete)
  File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/modules/access/models.py", line 109, in get
    cookie_id = int(cookie[16:-16], 16)
ValueError: invalid literal for int() with base 16: '
nharraud commented 8 years ago

The following code reproduce the issue:

# -*- coding: utf-8 -*-
from xml.etree import ElementTree as ET
from xml.dom import minidom
from sickle import Sickle
sickle = Sickle('https://b2share.eudat.eu/oai2d')
records = sickle.ListRecords(metadataPrefix='marcxml', set='Linguistics')
# records = sickle.ListRecords(metadataPrefix='oai_dc', set='Linguistics')

raw = records.oai_response.raw.encode('utf-8')
pretty_raw = minidom.parseString(raw).toprettyxml().encode('utf-8')
# check that the xml is valid
ET.fromstring(pretty_raw)
print pretty_raw

print('='*100)
item = records.next() # FAILS HERE
pretty_item = minidom.parseString(item.raw.encode('utf-8')) \
    .toprettyxml().encode('utf-8')
print pretty_item

the records.next() works with oai_dc but not with marcxml. Strangely enough the marcxml itself is not there for some of the results, which might be why sickle fails to iterate.

Record 1 is one of those records https://b2share.eudat.eu/record/1 When I request its marcxml in the browser it works. No idea why it doesn't with oai-pmh.

I searched for issues related to OAI-PMH in Invenio and found this one: https://github.com/inveniosoftware/invenio/issues/2962

@llehtine could you please check the CFG_OAI_METADATA_FORMATS configuration parameter?

llehtine commented 8 years ago

the parameter is the default one which is defined in config.py:

CFG_OAI_METADATA_FORMATS = { 'oai_dc': ('XOAIDC', 'http://www.openarchives.org/OAI/1.1/dc.xsd', 'http://purl.org/dc/elements/1.1/'), 'marcxml': ('XOAIMARC', 'http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd', 'http://www.loc.gov/MARC21/slim'), }

emanueldima commented 8 years ago

Ok, my conclusion is that some records do not show the marcxml metadata in OAI-PMH, and that means the OAI-PMH xml document is invalid according to its own schema. I get the following XML validation errors:

XML validation started.
Referenced entity at "http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd".
cvc-complex-type.2.4.b: The content of element 'metadata' is not complete.
One of '{WC[##other:"http://www.openarchives.org/OAI/2.0/"]}' is expected.
[14]
cvc-complex-type.2.4.b: The content of element 'metadata' is not complete.
One of '{WC[##other:"http://www.openarchives.org/OAI/2.0/"]}' is expected.
[23]
cvc-complex-type.2.4.b: The content of element 'metadata' is not complete.
One of '{WC[##other:"http://www.openarchives.org/OAI/2.0/"]}' is expected.
[32]
cvc-complex-type.2.4.b: The content of element 'metadata' is not complete.
One of '{WC[##other:"http://www.openarchives.org/OAI/2.0/"]}' is expected.
[41]
cvc-complex-type.2.4.b: The content of element 'metadata' is not complete.
One of '{WC[##other:"http://www.openarchives.org/OAI/2.0/"]}' is expected.
[50]
cvc-complex-type.2.4.b: The content of element 'metadata' is not complete.
One of '{WC[##other:"http://www.openarchives.org/OAI/2.0/"]}' is expected.
[59]
cvc-complex-type.2.4.b: The content of element 'metadata' is not complete.
One of '{WC[##other:"http://www.openarchives.org/OAI/2.0/"]}' is expected.
[68]
cvc-complex-type.2.4.b: The content of element 'metadata' is not complete.
One of '{WC[##other:"http://www.openarchives.org/OAI/2.0/"]}' is expected.
[77]
cvc-complex-type.2.4.b: The content of element 'metadata' is not complete.
One of '{WC[##other:"http://www.openarchives.org/OAI/2.0/"]}' is expected.
[86]
cvc-complex-type.2.4.b: The content of element 'metadata' is not complete.
One of '{WC[##other:"http://www.openarchives.org/OAI/2.0/"]}' is expected.
[95]
cvc-complex-type.2.4.b: The content of element 'metadata' is not complete.
One of '{WC[##other:"http://www.openarchives.org/OAI/2.0/"]}' is expected.
[104]
cvc-complex-type.2.4.b: The content of element 'metadata' is not complete.
One of '{WC[##other:"http://www.openarchives.org/OAI/2.0/"]}' is expected.
[113]
cvc-complex-type.2.4.b: The content of element 'metadata' is not complete.
One of '{WC[##other:"http://www.openarchives.org/OAI/2.0/"]}' is expected.
[122]
Referenced entity at "
http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd".
cvc-pattern-valid: Value 'C' is not facet-valid with respect to pattern
'[\da-z ]{1}' for type 'indicatorDataType'. [262]
cvc-attribute.3: The value 'C' of attribute 'ind1' on element
'marc:datafield' is not valid with respect to its type,
'indicatorDataType'. [262]
cvc-pattern-valid: Value 'O' is not facet-valid with respect to pattern
'[\da-z ]{1}' for type 'indicatorDataType'. [262]
cvc-attribute.3: The value 'O' of attribute 'ind2' on element
'marc:datafield' is not valid with respect to its type,
'indicatorDataType'. [262]
cvc-pattern-valid: Value 'C' is not facet-valid with respect to pattern
'[\da-z ]{1}' for type 'indicatorDataType'. [387]
cvc-attribute.3: The value 'C' of attribute 'ind1' on element
'marc:datafield' is not valid with respect to its type,
'indicatorDataType'. [387]
cvc-pattern-valid: Value 'O' is not facet-valid with respect to pattern
'[\da-z ]{1}' for type 'indicatorDataType'. [387]
Too many errors, stopping further checking.
XML validation finished.

I suppose that for these records the OAI-PMH marcxml must be somehow regenerated, maybe using the invenio oai command line tools. Lassi, did you use the oairepositoryupdater tool, can you try to regenerate these documents?

nharraud commented 8 years ago

Thanks @llehtine.

I tested locally with B2Share 1.6.2 and aoi server works correctly with marcxml. Could you please check that oairepositoryupdater is running in bibsched?

nharraud commented 8 years ago

@emanueldima same solution found at the same time ^^

emanueldima commented 8 years ago

But will it work?... suspense...

llehtine commented 8 years ago

the oairepositoryupdater is running in bibsched with 5m intervals. should i try to run it with some special parameters?

nharraud commented 8 years ago

Thanks again @llehtine I just checked record https://b2share.eudat.eu/record/1 which is one of those not returning a marcxml (https://b2share.eudat.eu/record/1/export/xm?ln=en). It has the field 909 added once oairepositoryupdater and the bibuload task it creates have run.

<datafield tag="909" ind1="C" ind2="O">
  <subfield code="o">oai:b2share.eudat.eu:5</subfield>
  <subfield code="p">GLOBAL_SET</subfield>
  <subfield code="p">Linguistics</subfield>
</datafield>

So it looks like this was not the issue.

However I saw one common pattern difference between documents returning a marcxml and those which are not. The documents returning a marcxml have Linguistics as the first p subfield and GLOBAL_SET as the second one. I don't know yet if this is relevant. I have to investigate more.

nharraud commented 8 years ago

@llehtine saw that the bibfmt table had no xm format for the failing records. Running bibreformat -uadmin -oxm seems to have fixed the issue. Now all records output a marcxml.

@tiborsimko told me to be careful with this as marcxml is a master format and normally should not need a bibreformat. I tested a few records' marcxml to make sure that they are still the same and I didn't find any difference. The previous marcxml, the new ones and the oai ones are all the same.

But if we find some discrepancy it might come from having run this command.

@emanueldima can you ask the B2Find team to try again? my test script is now passing so it should work.

hwidmann commented 8 years ago

I gave it another try and now it works fine, i.e. all available 286 XML records could be harvested again and the B2FIND repository could be updated, as you can see at http://b2find.eudat.eu/dataset?groups=b2share From my (B2FIND :-; ) side the issue can be closed. Thanks a lot !