Closed emanueldima closed 8 years ago
https://b2share.eudat.eu/oai2d?verb=ListRecords&metadataPrefix=marcxml&set=Linguistics shows also errors "Unknown metadata format" for some entries. And when loading page, this is what is coming to invenio log files:
2015-12-11 13:18:27,832 ERROR: [in /var/www/.virtualenvs/b2share/lib/python2.7/site-packages
/invenio-2.0.7.dev20150901-py2.7.egg/invenio/ext/logging/wrappers.py:310]
Traceback (most recent call last):
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/ext/legacy/__init__.py", line 124, in __call__
response = self.app.full_dispatch_request()
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask/app.py", line 1477, in
full_dispatch_request
rv = self.handle_user_exception(e)
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask_restful/__init__.py", line 258,
in error_router
return original_handler(e)
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/base/wrappers.py", line 125, in handle_user_exception
return super(Flask, self).handle_user_exception(e)
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask/app.py", line 1381, in
handle_user_exception
reraise(exc_type, exc_value, tb)
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask/app.py", line 1475, in
full_dispatch_request
rv = self.dispatch_request()
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask/app.py", line 1461, in
dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/modules/records/views.py", line 155, in decorated
return f(recid, *args, **kwargs)
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask_login.py", line 657, in
decorated_view
return current_app.login_manager.unauthorized()
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask_login.py", line 180, in
unauthorized
return self.unauthorized_callback()
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/ext/login/__init__.py", line 161, in do_login_first
return login(referer=request.url), 401
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/base/decorators.py", line 203, in decorator
return f(*args, **argd)
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/modules/accounts/views/accounts.py", line 75, in login
action, arguments = mail_cookie_check_authorize_action(action)
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/modules/access/mailcookie.py", line 165, in mail_cookie_check_authorize_action
(kind, params) = mail_cookie_check_common(cookie)
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/modules/access/mailcookie.py", line 106, in mail_cookie_check_common
obj = AccMAILCOOKIE.get(cookie, delete=delete)
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/modules/access/models.py", line 109, in get
cookie_id = int(cookie[16:-16], 16)
ValueError: invalid literal for int() with base 16: ''
2015-12-11 13:18:27,846 ERROR: Exception on /record/200/reviews/add [GET] [in /var/www
/.virtualenvs/b2share/lib/python2.7/site-packages/flask/app.py:1423]
Traceback (most recent call last):
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/ext/legacy/__init__.py", line 124, in __call__
response = self.app.full_dispatch_request()
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask/app.py", line 1477, in
full_dispatch_request
rv = self.handle_user_exception(e)
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask_restful/__init__.py", line 258,
in error_router
return original_handler(e)
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/base/wrappers.py", line 125, in handle_user_exception
return super(Flask, self).handle_user_exception(e)
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask/app.py", line 1381, in
handle_user_exception
reraise(exc_type, exc_value, tb)
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask/app.py", line 1475, in
full_dispatch_request
rv = self.dispatch_request()
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask/app.py", line 1461, in
dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/modules/records/views.py", line 155, in decorated
return f(recid, *args, **kwargs)
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask_login.py", line 657, in
decorated_view
return current_app.login_manager.unauthorized()
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/flask_login.py", line 180, in
unauthorized
return self.unauthorized_callback()
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/ext/login/__init__.py", line 161, in do_login_first
return login(referer=request.url), 401
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/base/decorators.py", line 203, in decorator
return f(*args, **argd)
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/modules/accounts/views/accounts.py", line 75, in login
action, arguments = mail_cookie_check_authorize_action(action)
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/modules/access/mailcookie.py", line 165, in mail_cookie_check_authorize_action
(kind, params) = mail_cookie_check_common(cookie)
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/modules/access/mailcookie.py", line 106, in mail_cookie_check_common
obj = AccMAILCOOKIE.get(cookie, delete=delete)
File "/var/www/.virtualenvs/b2share/lib/python2.7/site-packages/invenio-2.0.7.dev20150901-
py2.7.egg/invenio/modules/access/models.py", line 109, in get
cookie_id = int(cookie[16:-16], 16)
ValueError: invalid literal for int() with base 16: '
The following code reproduce the issue:
# -*- coding: utf-8 -*-
from xml.etree import ElementTree as ET
from xml.dom import minidom
from sickle import Sickle
sickle = Sickle('https://b2share.eudat.eu/oai2d')
records = sickle.ListRecords(metadataPrefix='marcxml', set='Linguistics')
# records = sickle.ListRecords(metadataPrefix='oai_dc', set='Linguistics')
raw = records.oai_response.raw.encode('utf-8')
pretty_raw = minidom.parseString(raw).toprettyxml().encode('utf-8')
# check that the xml is valid
ET.fromstring(pretty_raw)
print pretty_raw
print('='*100)
item = records.next() # FAILS HERE
pretty_item = minidom.parseString(item.raw.encode('utf-8')) \
.toprettyxml().encode('utf-8')
print pretty_item
the records.next()
works with oai_dc
but not with marcxml
. Strangely enough the marcxml itself is not there for some of the results, which might be why sickle
fails to iterate.
Record 1 is one of those records https://b2share.eudat.eu/record/1 When I request its marcxml in the browser it works. No idea why it doesn't with oai-pmh.
I searched for issues related to OAI-PMH in Invenio and found this one: https://github.com/inveniosoftware/invenio/issues/2962
@llehtine could you please check the CFG_OAI_METADATA_FORMATS
configuration parameter?
the parameter is the default one which is defined in config.py:
CFG_OAI_METADATA_FORMATS = { 'oai_dc': ('XOAIDC', 'http://www.openarchives.org/OAI/1.1/dc.xsd', 'http://purl.org/dc/elements/1.1/'), 'marcxml': ('XOAIMARC', 'http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd', 'http://www.loc.gov/MARC21/slim'), }
Ok, my conclusion is that some records do not show the marcxml metadata in OAI-PMH, and that means the OAI-PMH xml document is invalid according to its own schema. I get the following XML validation errors:
XML validation started.
Referenced entity at "http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd".
cvc-complex-type.2.4.b: The content of element 'metadata' is not complete.
One of '{WC[##other:"http://www.openarchives.org/OAI/2.0/"]}' is expected.
[14]
cvc-complex-type.2.4.b: The content of element 'metadata' is not complete.
One of '{WC[##other:"http://www.openarchives.org/OAI/2.0/"]}' is expected.
[23]
cvc-complex-type.2.4.b: The content of element 'metadata' is not complete.
One of '{WC[##other:"http://www.openarchives.org/OAI/2.0/"]}' is expected.
[32]
cvc-complex-type.2.4.b: The content of element 'metadata' is not complete.
One of '{WC[##other:"http://www.openarchives.org/OAI/2.0/"]}' is expected.
[41]
cvc-complex-type.2.4.b: The content of element 'metadata' is not complete.
One of '{WC[##other:"http://www.openarchives.org/OAI/2.0/"]}' is expected.
[50]
cvc-complex-type.2.4.b: The content of element 'metadata' is not complete.
One of '{WC[##other:"http://www.openarchives.org/OAI/2.0/"]}' is expected.
[59]
cvc-complex-type.2.4.b: The content of element 'metadata' is not complete.
One of '{WC[##other:"http://www.openarchives.org/OAI/2.0/"]}' is expected.
[68]
cvc-complex-type.2.4.b: The content of element 'metadata' is not complete.
One of '{WC[##other:"http://www.openarchives.org/OAI/2.0/"]}' is expected.
[77]
cvc-complex-type.2.4.b: The content of element 'metadata' is not complete.
One of '{WC[##other:"http://www.openarchives.org/OAI/2.0/"]}' is expected.
[86]
cvc-complex-type.2.4.b: The content of element 'metadata' is not complete.
One of '{WC[##other:"http://www.openarchives.org/OAI/2.0/"]}' is expected.
[95]
cvc-complex-type.2.4.b: The content of element 'metadata' is not complete.
One of '{WC[##other:"http://www.openarchives.org/OAI/2.0/"]}' is expected.
[104]
cvc-complex-type.2.4.b: The content of element 'metadata' is not complete.
One of '{WC[##other:"http://www.openarchives.org/OAI/2.0/"]}' is expected.
[113]
cvc-complex-type.2.4.b: The content of element 'metadata' is not complete.
One of '{WC[##other:"http://www.openarchives.org/OAI/2.0/"]}' is expected.
[122]
Referenced entity at "
http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd".
cvc-pattern-valid: Value 'C' is not facet-valid with respect to pattern
'[\da-z ]{1}' for type 'indicatorDataType'. [262]
cvc-attribute.3: The value 'C' of attribute 'ind1' on element
'marc:datafield' is not valid with respect to its type,
'indicatorDataType'. [262]
cvc-pattern-valid: Value 'O' is not facet-valid with respect to pattern
'[\da-z ]{1}' for type 'indicatorDataType'. [262]
cvc-attribute.3: The value 'O' of attribute 'ind2' on element
'marc:datafield' is not valid with respect to its type,
'indicatorDataType'. [262]
cvc-pattern-valid: Value 'C' is not facet-valid with respect to pattern
'[\da-z ]{1}' for type 'indicatorDataType'. [387]
cvc-attribute.3: The value 'C' of attribute 'ind1' on element
'marc:datafield' is not valid with respect to its type,
'indicatorDataType'. [387]
cvc-pattern-valid: Value 'O' is not facet-valid with respect to pattern
'[\da-z ]{1}' for type 'indicatorDataType'. [387]
Too many errors, stopping further checking.
XML validation finished.
I suppose that for these records the OAI-PMH marcxml must be somehow
regenerated, maybe using the invenio oai command line tools. Lassi, did you
use the oairepositoryupdater
tool, can you try to regenerate these
documents?
Thanks @llehtine.
I tested locally with B2Share 1.6.2 and aoi server works correctly with marcxml.
Could you please check that oairepositoryupdater
is running in bibsched
?
@emanueldima same solution found at the same time ^^
But will it work?... suspense...
the oairepositoryupdater is running in bibsched with 5m intervals. should i try to run it with some special parameters?
Thanks again @llehtine
I just checked record https://b2share.eudat.eu/record/1 which is one of those not returning a marcxml (https://b2share.eudat.eu/record/1/export/xm?ln=en). It has the field 909
added once oairepositoryupdater
and the bibuload
task it creates have run.
<datafield tag="909" ind1="C" ind2="O">
<subfield code="o">oai:b2share.eudat.eu:5</subfield>
<subfield code="p">GLOBAL_SET</subfield>
<subfield code="p">Linguistics</subfield>
</datafield>
So it looks like this was not the issue.
However I saw one common pattern difference between documents returning a marcxml and those which are not. The documents returning a marcxml have Linguistics
as the first p subfield
and GLOBAL_SET
as the second one. I don't know yet if this is relevant. I have to investigate more.
@llehtine saw that the bibfmt
table had no xm
format for the failing records. Running bibreformat -uadmin -oxm
seems to have fixed the issue. Now all records output a marcxml.
@tiborsimko told me to be careful with this as marcxml
is a master format and normally should not need a bibreformat
. I tested a few records' marcxml to make sure that they are still the same and I didn't find any difference. The previous marcxml, the new ones and the oai ones are all the same.
But if we find some discrepancy it might come from having run this command.
@emanueldima can you ask the B2Find team to try again? my test script is now passing so it should work.
I gave it another try and now it works fine, i.e. all available 286 XML records could be harvested again and the B2FIND repository could be updated, as you can see at http://b2find.eudat.eu/dataset?groups=b2share From my (B2FIND :-; ) side the issue can be closed. Thanks a lot !
This report comes from Heinrich Widmann (B2FIND):
Try e.g. https://b2share.eudat.eu/oai2d?verb=ListRecords&metadataPrefix=marcxml. It seems to work in Browser, but via harvesting by OAI iterator in Python I get error mesages 'HTTP 503', see as well the log file attached.
For set=Linguistics the getRecord-Request runs in HTTP errors and finally in IndexError: list index out of range. E.g. https://b2share.eudat.eu/oai2d?verb=ListRecords&metadataPrefix=marcxml&set=Linguistics leads as well submited in a browser to empty records ...