bloomonkey / oai-harvest

Python package for harvesting records from OAI-PMH provider(s).
Other
62 stars 41 forks source link

BadVerbError: Value of the verb argument ... #41

Closed ksbbf closed 5 months ago

ksbbf commented 6 months ago

I have exactly the same failure and error like BadVerbError: #19

` File "/Users/kilian/Library/Python/3.11/lib/python/site-packages/oaipmh/common.py", line 121, in call return bound_self.handleVerb(self._verb, kw) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/Users/kilian/Library/Python/3.11/lib/python/site-packages/oaipmh/client.py", line 74, in handleVerb kw, self.makeRequestErrorHandling(verb=verb, **kw)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/Users/kilian/Library/Python/3.11/lib/python/site-packages/oaipmh/client.py", line 308, in makeRequestErrorHandling raise getattr(error, code[0].upper() + code[1:] + 'Error')(msg)

oaipmh.error.BadVerbError: Value of the verb argument is not a legal OAI-PMH verb, the verb argument is missing, or the verb argument is repeated.`

Seven years after #19, I am on a mac, oai-harvest is installed in Python 3.11 envorinment.

jeanbaptisteb commented 6 months ago

@ksbbf I'm not the library maintainer, but when trying to replicate the issue #19, everything works perfectly fine for me. I'm using Python 3.10.4, but that probably doesn't make a difference.

Maybe you should mention the specific request that you used, because the error message alone might not be completely useful to identify and replicate the issue.

ksbbf commented 6 months ago

I try to harvest several oai-pmh in goobi.viewer by intranda, beacuase I know they have a oai-pmh interface. So I simply called

oai-harvest https://gei-digital.gei.de/viewer/oai

The error messages were those

INFO     Harvesting from https://gei-digital.gei.de/viewer/oai
ERROR    Value of the verb argument is not a legal OAI-PMH verb, the verb argument is missing, or the verb argument is repeated.
Traceback (most recent call last):
  File "/Users/myname/Library/Python/3.11/lib/python/site-packages/oaiharvest/harvest.py", line 303, in main
    completed = harvester.harvest(baseUrl,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/myname/Library/Python/3.11/lib/python/site-packages/oaiharvest/harvest.py", line 135, in harvest
    for header, metadata, about in self._listRecords(
  File "/Users/myname/Library/Python/3.11/lib/python/site-packages/oaiharvest/harvest.py", line 83, in _listRecords
    client.identify()
  File "/Users/myname/Library/Python/3.11/lib/python/site-packages/oaipmh/common.py", line 126, in method
    return obj(self, **kw)
           ^^^^^^^^^^^^^^^
  File "/Users/myname/Library/Python/3.11/lib/python/site-packages/oaipmh/common.py", line 121, in __call__
    return bound_self.handleVerb(self._verb, kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/myname/Library/Python/3.11/lib/python/site-packages/oaipmh/client.py", line 74, in handleVerb
    kw, self.makeRequestErrorHandling(verb=verb, **kw))
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/myname/Library/Python/3.11/lib/python/site-packages/oaipmh/client.py", line 308, in makeRequestErrorHandling
    raise getattr(error, code[0].upper() + code[1:] + 'Error')(msg)
oaipmh.error.BadVerbError: Value of the verb argument is not a legal OAI-PMH verb, the verb argument is missing, or the verb argument is repeated.

When i look again on them today, they look a little bit different than #19. I woll look closer on them tomorrow.

jeanbaptisteb commented 6 months ago

@ksbbf When trying oai-harvest https://gei-digital.gei.de/viewer/oai/ (note the slash at the end of the url), I have a different error message: oaipmh.error.DatestampError. So it's a problem related to dates.

Looking at https://gei-digital.gei.de/viewer/oai?verb=Identify, it seems that the "earliestDatestamp" field is empty, when it is mandatory and apparently should probably be not empty according to the OAI-PMH specifications. The problem seems to come from that, at least in part. I'd suggest to report the problem to the repository owners, so they fix it. Their e-mail address is available in the "Admin email" field.

Alternatively, you can use another harvester that doesn't stop when encountering this kind of problem. I stumbled upon this script for instance: https://github.com/vphill/pyoaiharvester .

ksbbf commented 6 months ago

Sorry, could anybody provide me a successful example. I tried several dataproviders https://www.openarchives.org/Register/BrowseSites but only getting error messages.

One sample! Thanks

jeanbaptisteb commented 6 months ago

@ksbbf Taking an example from your link, the command line oai-harvest http://iberoamericasocial.com/ojs/index.php/index/oai works perfectly fine for me. What kind of error message do you get?

On my side, here's the output I get and that you should get too:

INFO     Harvesting from http://iberoamericasocial.com/ojs/index.php/index/oai
TEST
--##2023-04-24T06:30:34Z  [repeated a few times]

and then:

DEBUG    Writing to file D:\Documents\export_oai\oai%3Aojs.iberoamericasocial.com%3Aarticle%2F40.oai_dc.xml
DEBUG    Writing to file D:\Documents\export_oai\\oai%3Aojs.iberoamericasocial.com%3Aarticle%2F43.oai_dc.xml
DEBUG    Writing to file D:\Documents\export_oai\\oai%3Aojs.iberoamericasocial.com%3Aarticle%2F44.oai_dc.xml

etc. In the example above, the xml files are recorded in the folder D:\Documents\export_oai\.

ksbbf commented 6 months ago

Thanks @jeanbaptisteb

Oh dear, I get that again:

INFO Harvesting from http://iberoamericasocial.com/ojs/index.php/index/oai ERROR 'lxml.etree.XPathElementEvaluator' object has no attribute 'evaluate' Traceback (most recent call last): File "/Users/myname/Library/Python/3.11/lib/python/site-packages/oaiharvest/harvest.py", line 303, in main completed = harvester.harvest(baseUrl, ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/myname/Library/Python/3.11/lib/python/site-packages/oaiharvest/harvest.py", line 135, in harvest for header, metadata, about in self._listRecords( File "/Users/myname/Library/Python/3.11/lib/python/site-packages/oaiharvest/harvest.py", line 83, in _listRecords client.identify() File "/Users/myname/Library/Python/3.11/lib/python/site-packages/oaipmh/common.py", line 126, in method return obj(self, **kw) ^^^^^^^^^^^^^^^ File "/Users/myname/Library/Python/3.11/lib/python/site-packages/oaipmh/common.py", line 121, in call return bound_self.handleVerb(self._verb, kw) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/myname/Library/Python/3.11/lib/python/site-packages/oaipmh/client.py", line 73, in handleVerb return getattr(self, method_name)( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/myname/Library/Python/3.11/lib/python/site-packages/oaipmh/client.py", line 132, in Identify_impl identify_node = evaluator.evaluate( ^^^^^^^^^^^^^^^^^^ AttributeError: 'lxml.etree.XPathElementEvaluator' object has no attribute 'evaluate'

Hm. No idea.

Alternatively, you can use another harvester that doesn't stop when encountering this kind of problem. I stumbled upon this script for instance: https://github.com/vphill/pyoaiharvester

I can play around with it. That's a good starting point.

jeanbaptisteb commented 6 months ago

@ksbbf The Python package lxml is the culprit here. I use lxml 4.8.0, but I get the same error as you if I update it to its latest version (5.2.2). As a solution, I'd suggest to downgrade it to 4.8.0, using the command line pip install lxml==4.8.0, if you use pip to manage your packages. If you're using something else than pip to manage your packages (e.g. anaconda), you should refer to its documentation to see how to downgrade packages.

ksbbf commented 6 months ago

hui, ok!

I reached the point now, that I get that timestamp-error from gei and I can harvest from iberoamericasocial.

Now, it's a little bit tough: on my mac i have several python installations, maybe from homebrew.

In python 3.10 I could downgrad lxml, in python 3.11 not.

So, /Users/myname/Library/Python/3.10/bin/oai-harvest works as expected.

jeanbaptisteb commented 6 months ago

@ksbbf I tried with Python 3.11, and got an issue too when trying to downgrade to lxml 4.8, presumably because the combination of Python 3.11 and lxml < 5.0.0 requires libxml2, which seems to be a nightmare to install on Windows, which I am using.

You might be a bit luckier than me if you're on Mac: try installing libxml2 on it, and then retry downgrading lxml.

If it still does not work for you after that, welcome to dependency hell... Joke aside, if installing libxml2 doesn't solve your issue, if I were you, I'd simply stick to Python 3.10. Maybe use a virtual environment in case you need to upgrade lxml for other projects later.

I think that at some point developers of oai-harvest should consider updating the dependencies and code using lxml to address all of these problems, because all of this discussion hints to oai-harvest becoming more and more difficult to use at all.

ksbbf commented 5 months ago

@jeanbaptisteb Thanks for your help and accompaniing me this way a little bit.

Meanwhile we discovered that everybody is using https://github.com/vphill/pyoaiharvester in our project. So we learn how to handle the obstacles and debug in this script.