IATI / D-Portal

http://d-portal.org/
Other
31 stars 23 forks source link

Query about missing activity file in d-portal #600

Closed andreaszenasidi closed 3 years ago

andreaszenasidi commented 3 years ago

The Center for Church-Based Development is missing an activity from d-portal. The activity file was published on the 20th of January, 2021 and it's in the registry: https://iatiregistry.org/dataset/dmru-all

There is a critical error in their file, thus it's not pulled in the Datastore. It would be great to understand why d-portal is not pulling the file.

notshi commented 3 years ago

Hi @andreaszenasidi looks like it's taking a long time to get the data from their server so it times out.

curl: (28) Operation too slow. Less than 1000 bytes/sec transferred the last 30 seconds
Warning: Transient problem: timeout Will retry in 10 seconds. 4 retries left.

Here is the full log of the errors from last night's attempt at getting the data from them. http://d-portal.org/ctrack.html#view=dash_sluglog&slug=dmru-all

andreaszenasidi commented 3 years ago

@notshi thank you for looking into this issue. I will notify the publisher.

ChrisWohlert commented 3 years ago

Hello, I am one of the developers, who create the activity file for "Center for Church-Based Development". We had an error similar to the one you describe notshi, but solved it by increasing the timeout of our database call. To solve this issue, I would like to know, if you load the file from IATI-registry, or if you just the url in IATI to retrieve the file. If you use the url, then we are hosting the file, and I could try and increase the timeout further. However, if you get file from IATI registry, I won't be able to do this, and decreasing the size of the file is only possible solution.

Could you clarify, if increasing the timeout might work? I don't know how we might increase the speed at which you retrieve the file.

notshi commented 3 years ago

Hi @ChrisWohlert thanks for getting in touch.

We retrieve the file by using curl to download from the url in IATI. In this case, the url is https://pdb.dmru.org/iati/reportDMRU

We recommend adding a cache for your file. This way, when we go to retrieve a file from your server, there shouldn't be any problems like long waiting times.

An easy way of doing this is using CloudFlare which sits on top of your site and serves up cached files on your behalf.

notshi commented 3 years ago

Hi @ChrisWohlert just checking in on how things are going.

@andreaszenasidi looks like the link to https://iatiregistry.org/dataset/dmru-all is re-directing to https://iatiregistry.org/dataset/cku-all and the publisher page http://iatiregistry.org/dataset/dmru-be is being redirected to https://iatiregistry.org/dataset/cku-be

Both of which are throwing up an Error 404 Not Found Dataset not found page.

andreaszenasidi commented 3 years ago

@notshi there is an open issue about this on the Registry github. It's considered high priority and Derilinx will try to get to it as soon as possible.

notshi commented 3 years ago

Thanks for the update, @andreaszenasidi!

notshi commented 3 years ago

Hi @andreaszenasidi it seems the links are no longer being redirected and the logs for http://d-portal.org/ctrack.html#view=dash_sluglog&slug=dmru-all looks healthy.

Tue Mar  9 01:05:17 UTC 2021
Downloading dmru-all.xml from https://pdb.dmru.org/iati/reportDMRU

IATI XML last successfully downloaded on
Tue Mar  9 01:05:22 UTC 2021

Are the numbers on http://d-portal.org/ctrack.html?publisher=DK-CVR-12006004#view=main what you were expecting?

andreaszenasidi commented 3 years ago

@notshi These are the correct numbers that we were expecting. Many thanks for your help with this issue.