Consider adding the namespaces of vocabs from LOV catalogue

gatemezing commented 8 years ago

It could be great to sync to LOV catalogue to add in dydra the default prefixes. Details of the prefixes at http://lov.okfn.org/dataset/lov or http://lov.okfn.org/dataset/lov/context.

lisp commented 8 years ago

when we look at the catalog - some five hundred namespaces, our first question is, how significant are they? our second is, whether it is well advised to depend for this on an external resource.

in regard to the first question, on one hand, a facility exists for each user to establish default bindings for their requests at either the account level or the repository level. this gives them the means to set the significance for themselves.

on the other hand, the vocabularies are by no means uniformly popular, which may be taken to indicate against their blanket adoption.

we looked at their occurrence in our datasets. in terms of observed iri, it would certainly make sense for us to add a default binding for schema.org - that is the one which appears in hundreds of repositories. from there, however, the prevalence is fleeting. of those namespaces which are not already accommodated with default bindings, just roughly ten percent are used more than once. at which demand level does the usage diminish to the point where there is no utility?

gatemezing commented 8 years ago

How significant are they? - Well, I would say the "significance" could be based on two factors: 1- How much the vocabulary is reused by other vocabularies in the catalogue 2- How many datasets in the LOD cloud used the vocabulary. This query http://goo.gl/WG8zh0 shows a top-20 based on the above criteria. In terms of vocab reused, apart from owl, rdfs, rdf; you have dcterms, foaf, schema.org and skos in the top 10.

Another criteria could be the use of the vocabularies under W3C namespaces (see http://lov.okfn.org/dataset/lov/vocabs?tag=W3C%20Rec) for two reasons: 1- Most of the vocabularies are "recommended ones" 2- The URIs are sustainable and can be "life" for 20+ years.

Of course, you are raising good points.

lisp commented 8 years ago

if you look in the configuration for your repository, you will find that most of your top twenty are already pre-set. the issue is, if self-configuration does not satisfy, then either there is a cut-off or the set is constructed on-demand from a "trusted" source. in the first case, someone sets a utility cutoff. will you ever be happy? in the second, a fair amount of complexity is invested for marginal use. does that make sense?

lisp commented 8 years ago

of those in the w3c realm, several are not now default: csvw, dcat, org, prov, sd, skosxl. on one hand, this is historic, in that we set the default set quite some time ago and have not been curating it. on the other, while we could add those, that would not respond to the substance of the issue: how to ensure that the default set reflects those most useful to the most users.

in order to answer this, one must keep in mind, there are currently close to three thousand repositories on the site. those datasets contain roughly four hundred million iri terms. of that, just roughly one hundred million are drawn from a namespace known to the okfn, that means, the majority would in any case require that the user configure their own default prefix settings.

then, on the other hand, of those from the w3c realm which are not default, the dataset frequency is

csvw : 0
dcat : 0
org : 19
prov : 49
sd : 11 (of which, half are from one user)
skosxl : 7 (of which, two are skos vocabulary datasets)

this relative to the average frequency of 23 datasets, for those namespaces in use.

so the issue reduces to

what are the inclusion criteria?
who sets them?
who manages the statistics?

if we can settle on answers for the first two questions, we can certainly make a repository available to you to facilitate that last task and then use its content to augment a static set.

gatemezing commented 8 years ago

I see. Maybe one could start by using autocompletion fetching prefixes in LOV catalogue to help the users during the settings. This could be at least one help.

Now regarding the 3 questions above:

The inclusion criteria can be defined using different criteria: metadata vocabularies (rdfs, owl, skos, etc) ; trust and provenance (e.g., the case of vocabs under W3C domains), popularity in datasets, reusing aspect of the vocab (how many inlinks/outlinks to/from the vocabulary).
Who set them? I think the goal is to help as much as possible the user in the settings process. The less the user has to clic, the better will be. Now, it depends on the level of "expertise" of the user.

lisp commented 8 years ago

i see, there is now a "ghislain/lovdydra" repository.

if you define a query which yields the "canonical" binding set for those namespaces which are "in demand", we can track changes to the repository content and use them to trigger the query, with the results fed to a house-keeping task which augments the built-in set.

gatemezing commented 8 years ago

Yeah..but guess what? I am having an issue with the import " Import failed term is shorter than minimum length on line 612829, column 2. I am trying to upload the file at http://lov.okfn.org/lov.nq.gz (after downloaded in my local machine). Any idea? Normally I don't have that issue with fuseki or sesame.

lisp commented 8 years ago

try decompressing it. you will need also to clean it. it includes numerous iri which do not conform to n-quads syntax constraints.

rapper --count -i nquads lov.nq

indicates a couple dozen - some have trailing spaces and some are unqualified email addresses. rapper-lov-output.txt

gatemezing commented 8 years ago

Thanks. I'll have a look and hope I can upload the file. What is the limit size of a file I can upload with my settings?

lisp commented 8 years ago

a million statements at once. rapper's count result was well within that limit.

gatemezing commented 8 years ago

OK. But I am waiting for more than an hour to import a Turtle file with 198960 triples. Is that normal?

lisp commented 8 years ago

it looks like the failure from last week was still blocking. i have cleared that.

gatemezing commented 8 years ago

Ah really? I thought I had a clear version with this new version, since rapper was not complaining. So, what should I do? Fix from my side again and try to reload? How to know with the import tool that something is going wrong with the import tool? TIA

lisp commented 8 years ago

it appeared to me, that the error message on your repository page was that from the import attempt last week, which had not cleared properly. when the import tool rejects a document, the alert which appears on the repository page is the tool's error message. if you have a version which rapper accepts it should import without a problem.

gatemezing commented 8 years ago

That's weird! I was trying to upload yesterday just before sending my comment. I'll re-import again and see what happen. Thanks

gatemezing commented 8 years ago

The problem was with Google Chrome. I was using Chrome version 48.0.2564.97 (64-bit) on my Mac. Now I switch to Firefox and the import was faster (in less than 2 min)

lisp commented 8 years ago

this practice need also to keep alternative sources for mapping in mind. for example prefix.cc :

gatemezing commented 8 years ago

:+1: Yes, prefix.cc contains more than vocabularies. And we try to sync LOV with the service. You remind me to send the new file to Richard :)

dydra / support

Consider adding the namespaces of vocabs from LOV catalogue #39