NRGI / resourcedata.org

CKAN
3 stars 1 forks source link

When the dataset doesn't have a valid URL, it cannot be serialized to n3 and ttl formats #244

Closed adinuca closed 6 years ago

adinuca commented 6 years ago

Why

There are lots of errors reported because the turtle format and the notation3 format cannot be generated for datasets, when the URL is not valid.

What

Notes

The URLs for the 2 formats is available in the source code of the dataset page(Eg: https://www.resourcedata.org/dataset/33a07bf8-35f4-45be-a951-b61aed8287ac)

Examples: https://www.resourcedata.org/dataset/33a07bf8-35f4-45be-a951-b61aed8287ac https://www.resourcedata.org/dataset/33a07bf8-35f4-45be-a951-b61aed8287ac.ttl https://www.resourcedata.org/dataset/33a07bf8-35f4-45be-a951-b61aed8287ac.n3

https://www.resourcedata.org/dataset/f4d3130b-4557-47fb-b609-6b0080b05025 https://www.resourcedata.org/dataset/f4d3130b-4557-47fb-b609-6b0080b05025.ttl https://www.resourcedata.org/dataset/f4d3130b-4557-47fb-b609-6b0080b05025.n3

https://www.resourcedata.org/dataset/7bbcb65a-653c-42ea-acb0-45943630bbef https://www.resourcedata.org/dataset/7bbcb65a-653c-42ea-acb0-45943630bbef.ttl https://www.resourcedata.org/dataset/7bbcb65a-653c-42ea-acb0-45943630bbef.n3

https://www.resourcedata.org/dataset/28350801-8f55-4155-81ca-874b94b0809d https://www.resourcedata.org/dataset/28350801-8f55-4155-81ca-874b94b0809d.ttl https://www.resourcedata.org/dataset/28350801-8f55-4155-81ca-874b94b0809d.n3

adinuca commented 6 years ago

Hi @EricSoroos, could you please take a look at this issue?

Logs can be found here.

EricSoroos commented 6 years ago

Looking into this a bit -- the error is in that ckanext-dcat (and the dependency rdflib) strictly expects that URL be a valid URL and doesn't trap the error or skip when it's invalid. We definitely have metadata that's not a valid url, so the combination of that metadata not conforming to the field definition and the strict definition of the formats causes the error.

3 options to fix:

EricSoroos commented 6 years ago

sample possible error response:

screen shot 2018-07-10 at 11 53 50 am
adinuca commented 6 years ago

I think the response can be shorter. Something like : "Format not supported due to invalid URL".

It would be good if you could also catch the exception and log a message that tells you exactly what the issue is, instead of the long stack-trace.

EricSoroos commented 6 years ago

That error message is 90% url. That response is essentially catching the exception and returning something useful to the browser that will explain the situation and prevent a crawler from retaining it.

I don't think we need to log it, since we know exactly what's causing it and can find all cases of this with a sql query.

adinuca commented 6 years ago

Ok @EricSoroos, my main reason for the above message was to not have so many error logs that don't help. I agree we can just ignore the exception and return a proper response to the user.

adinuca commented 6 years ago

Hi @EricSoroos , do you have any update on this? There have been a few emails regarding errors generated by these URLs

adinuca commented 6 years ago

Hi @EricSoroos, any update on this?

adinuca commented 6 years ago

Hi @deirdrelee, do you know when this will get done? As with #249, it is hard to spot real problems in the logs because of these error logs generated by this issue.

EricSoroos commented 6 years ago

I've pushed a fix for this to staging

adinuca commented 6 years ago

Thank you!

adinuca commented 6 years ago

This has been fixed and deployed to production by @EricSoroos . Thank you!