codeforIATI / iatikit

🐨 A toolkit for using IATI data
https://iatikit.readthedocs.io
MIT License
6 stars 0 forks source link

Non XML file causing parser error #48

Closed johnadamsDFID closed 5 years ago

johnadamsDFID commented 5 years ago

Parsing the full IATI dataset today I get the following error

` File "src/lxml/parser.pxi", line 640, in lxml.etree._raiseParseError

File "iatikitcache/registry/data/rikolto/rikolto-international2017org.xml", line 1 XMLSyntaxError: Document is empty, line 1, column 1`

This is because the file location is returning an HTML page for this new publisher's data (https://iatiregistry.org/publisher/rikolto).

Is there a way of handling this better (try..catch model)?

andylolz commented 5 years ago

Hi @johnadamsDFID,

Thanks for this!

I ran into exactly the same thing, and thought I fixed it at version 2.2.2. Are you able to check which version you are running? You should be able to run the following:

import iatikit

print(iatikit.__version__)

If it’s earlier than v2.2.2, could you try running a pip install --upgrade? I’ve updated the installation instructions to show this, (although I have previously experienced problems running pip --upgrade inside of jupyter).

If the problem persists when using v2.2.2 or later, would you mind providing steps to reproduce? I.e. just a few commands I can run locally to hit the same error. Apologies in advance if the steps are obvious!

Thanks again,

johnadamsDFID commented 5 years ago

@andylolz Thanks, using 2.2.4 solved the problem. Thanks for the upgrade instructions. I didn't realise Jupyter didn't reload from the source each time.

andylolz commented 5 years ago

Excellent! Thanks @johnadamsDFID. Please keep the bug reports / feature requests coming in – it’s super useful.

I think once transaction support is in (#15) I’ll write a discuss post to advertise iatikit a bit more widely. Changed my mind and posted it! But I’ll add transaction support soon.