PublicaMundi / ckanext-publicamundi

PublicaMundi main CKAN extension
http://publicamundi.eu
Other
13 stars 12 forks source link

Raster_Identify Failed to download: invalid url? #234

Open howff opened 6 years ago

howff commented 6 years ago

The URL reported in the error message seems to be missing the host and path parts when I add a raster resource to a dataset:

Downloading resource from http://nppdnbdaysdr.17319131254.tif to: /var/local/ckan/default/tmp//rasterstorer//Cov_.raster

Is that the reason why celery sticks? Maybe there is some magic configuration file to edit?

Full message:

[2017-11-20 15:03:33,593: INFO/PoolWorker-1] rasterstorer.identify[24bc5372-7c28-4524-93e8-b222e62fbf12]: [Raster_Identify]Downloading resource a32a6046-ce1f-42a4-a417-118376bb32e3... [2017-11-20 15:03:33,593: INFO/PoolWorker-1] rasterstorer.identify[24bc5372-7c28-4524-93e8-b222e62fbf12]: [Raster_DownloadResource] Downloading resource a32a6046-ce1f-42a4-a417-118376bb32e3 from http://nppdnbdaysdr.17319131254.tif to: /var/local/ckan/default/tmp//rasterstorer/a32a6046-ce1f-42a4-a417-118376bb32e3/Cov_a32a6046_ce1f_42a4_a417_118376bb32e3.raster [2017-11-20 15:03:33,596: ERROR/PoolWorker-1] rasterstorer.identify[24bc5372-7c28-4524-93e8-b222e62fbf12]: [Raster_Identify] Failed to download: Failed to download http://nppdnbdaysdr.17319131254.tif: <urlopen error [Errno -2] Name or service not known> [2017-11-20 15:03:33,610: ERROR/MainProcess] Task rasterstorer.identify[24bc5372-7c28-4524-93e8-b222e62fbf12] raised exception: CannotDownload('Failed to download http://nppdnbdaysdr.17319131254.tif: <urlopen error [Errno -2] Name or service not known>',) Traceback (most recent call last): File "/var/local/ckan/default/pyenv/local/lib/python2.7/site-packages/celery/execute/trace.py", line 47, in trace return cls(states.SUCCESS, retval=fun(*args, kwargs)) File "/var/local/ckan/default/pyenv/local/lib/python2.7/site-packages/celery/app/task/init.py", line 247, in call return self.run(*args, *kwargs) File "/var/local/ckan/default/pyenv/local/lib/python2.7/site-packages/celery/app/init.py", line 175, in run return fun(args, kwargs) File "/var/local/ckan/default/pyenv/src/ckanext-publicamundi/ckanext/publicamundi/storers/raster/tasks.py", line 28, in rasterstorer_identify rasterstorer_identify.retry(exc=ex, countdown=60) File "/var/local/ckan/default/pyenv/local/lib/python2.7/site-packages/celery/app/task/init.py", line 535, in retry self.name, options["task_id"], args, kwargs)) CannotDownload: Failed to download http://nppdnbdaysdr.17319131254.tif: <urlopen error [Errno -2] Name or service not known>

kalxas commented 6 years ago

The URL does not seem to be a valid one.

howff commented 6 years ago

That's right, but it has been generated by the raster importer so either there is a bug in the raster importer or something needs to be configured somewhere. Any ideas?

(I created a dataset in ckan web interface and attached a geotiff resource)

kalxas commented 6 years ago

@drmalex07 any ideas?

drmalex07 commented 6 years ago

Well, i think you should ping the rasdaman team, which was the only one involved with raster-storer plugin.

vladmerti commented 6 years ago

cross posting from the rasdaman-dev mailing list in case I'm missing something:

Hi Andrew,

It's been a while since I looked over this code.

The error occurs in https://github.com/PublicaMundi/ckanext-publicamundi/blob/master/ckanext/publicamundi/storers/raster/tasks.py#L11. The method exposes a celery task which prepares a resource for ingestion in rasdaman. In your case the URL points to a non-existing resource, so the rasterstorer can not import it.

You could track where the URL is coming from. It appears in the task context, and is passed on to a utility class for download. On https://github.com/PublicaMundi/ckanext-publicamundi/blob/master/ckanext/publicamundi/storers/raster/tasks.py#L18 you already have dump of the context, so a first step would be to check if the url is ok in the context or it's already broken when it gets there. If the URL is correct in the context but it still fails to download, you can have a look at https://github.com/PublicaMundi/ckanext-publicamundi/blob/master/ckanext/publicamundi/storers/raster/raster_plugin_util.py#L49 (but my intuition is that the URL is already pointing to nothing in the context).

The celery task itself is created whenever a new resource with one of the geotiff, png, jpeg, zip or raster formats is added to ckan (https://github.com/PublicaMundi/ckanext-publicamundi/blob/master/ckanext/publicamundi/storers/raster/plugin.py#L60).

HTH, Vlad

howff commented 6 years ago

Vlad,

Thank you ever so much for explaining some of the inner workings, it really helps me to understand the architecture and to try and track down what is going on.

On 29 November 2017 at 11:20, Vlad Merticariu notifications@github.com wrote:

cross posting from the rasdaman-dev mailing list in case I'm missing something:

Hi Andrew,

It's been a while since I looked over this code.

The error occurs in https://github.com/PublicaMundi/ckanext- publicamundi/blob/master/ckanext/publicamundi/storers/raster/tasks.py#L11. The method exposes a celery task which prepares a resource for ingestion in rasdaman. In your case the URL points to a non-existing resource, so the rasterstorer can not import it.

You could track where the URL is coming from. It appears in the task context, and is passed on to a utility class for download. On https://github.com/PublicaMundi/ckanext-publicamundi/blob/master/ ckanext/publicamundi/storers/raster/tasks.py#L18 you already have dump of the context, so a first step would be to check if the url is ok in the context or it's already broken when it gets there. If the URL is correct in the context but it still fails to download, you can have a look at https://github.com/PublicaMundi/ckanext-publicamundi/blob/master/ ckanext/publicamundi/storers/raster/raster_plugin_util.py#L49 (but my intuition is that the URL is already pointing to nothing in the context).

The celery task itself is created whenever a new resource with one of the geotiff, png, jpeg, zip or raster formats is added to ckan ( https://github.com/PublicaMundi/ckanext-publicamundi/blob/master/ ckanext/publicamundi/storers/raster/plugin.py#L60).

HTH, Vlad

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/PublicaMundi/ckanext-publicamundi/issues/234#issuecomment-347830782, or mute the thread https://github.com/notifications/unsubscribe-auth/AC7B_M6vQOVGkiZXRQgWyaOuaq_NTMLgks5s7T3-gaJpZM4Qpi4L .