Open gatemezing opened 8 years ago
when we look at the catalog - some five hundred namespaces, our first question is, how significant are they? our second is, whether it is well advised to depend for this on an external resource.
in regard to the first question, on one hand, a facility exists for each user to establish default bindings for their requests at either the account level or the repository level. this gives them the means to set the significance for themselves.
on the other hand, the vocabularies are by no means uniformly popular, which may be taken to indicate against their blanket adoption.
we looked at their occurrence in our datasets. in terms of observed iri, it would certainly make sense for us to add a default binding for schema.org - that is the one which appears in hundreds of repositories. from there, however, the prevalence is fleeting. of those namespaces which are not already accommodated with default bindings, just roughly ten percent are used more than once. at which demand level does the usage diminish to the point where there is no utility?
How significant are they? - Well, I would say the "significance" could be based on two factors: 1- How much the vocabulary is reused by other vocabularies in the catalogue 2- How many datasets in the LOD cloud used the vocabulary. This query http://goo.gl/WG8zh0 shows a top-20 based on the above criteria. In terms of vocab reused, apart from owl, rdfs, rdf; you have dcterms, foaf, schema.org and skos in the top 10.
Another criteria could be the use of the vocabularies under W3C namespaces (see http://lov.okfn.org/dataset/lov/vocabs?tag=W3C%20Rec) for two reasons: 1- Most of the vocabularies are "recommended ones" 2- The URIs are sustainable and can be "life" for 20+ years.
Of course, you are raising good points.
if you look in the configuration for your repository, you will find that most of your top twenty are already pre-set. the issue is, if self-configuration does not satisfy, then either there is a cut-off or the set is constructed on-demand from a "trusted" source. in the first case, someone sets a utility cutoff. will you ever be happy? in the second, a fair amount of complexity is invested for marginal use. does that make sense?
of those in the w3c realm, several are not now default: csvw, dcat, org, prov, sd, skosxl. on one hand, this is historic, in that we set the default set quite some time ago and have not been curating it. on the other, while we could add those, that would not respond to the substance of the issue: how to ensure that the default set reflects those most useful to the most users.
in order to answer this, one must keep in mind, there are currently close to three thousand repositories on the site. those datasets contain roughly four hundred million iri terms. of that, just roughly one hundred million are drawn from a namespace known to the okfn, that means, the majority would in any case require that the user configure their own default prefix settings.
then, on the other hand, of those from the w3c realm which are not default, the dataset frequency is
this relative to the average frequency of 23 datasets, for those namespaces in use.
so the issue reduces to
if we can settle on answers for the first two questions, we can certainly make a repository available to you to facilitate that last task and then use its content to augment a static set.
I see. Maybe one could start by using autocompletion fetching prefixes in LOV catalogue to help the users during the settings. This could be at least one help.
Now regarding the 3 questions above:
i see, there is now a "ghislain/lovdydra" repository.
if you define a query which yields the "canonical" binding set for those namespaces which are "in demand", we can track changes to the repository content and use them to trigger the query, with the results fed to a house-keeping task which augments the built-in set.
Yeah..but guess what? I am having an issue with the import " Import failed term is shorter than minimum length on line 612829, column 2. I am trying to upload the file at http://lov.okfn.org/lov.nq.gz (after downloaded in my local machine). Any idea? Normally I don't have that issue with fuseki or sesame.
try decompressing it. you will need also to clean it. it includes numerous iri which do not conform to n-quads syntax constraints.
rapper --count -i nquads lov.nq
indicates a couple dozen - some have trailing spaces and some are unqualified email addresses. rapper-lov-output.txt
Thanks. I'll have a look and hope I can upload the file. What is the limit size of a file I can upload with my settings?
a million statements at once. rapper's count result was well within that limit.
OK. But I am waiting for more than an hour to import a Turtle file with 198960 triples. Is that normal?
it looks like the failure from last week was still blocking. i have cleared that.
Ah really? I thought I had a clear version with this new version, since rapper was not complaining. So, what should I do? Fix from my side again and try to reload? How to know with the import tool that something is going wrong with the import tool? TIA
it appeared to me, that the error message on your repository page was that from the import attempt last week, which had not cleared properly. when the import tool rejects a document, the alert which appears on the repository page is the tool's error message. if you have a version which rapper accepts it should import without a problem.
That's weird! I was trying to upload yesterday just before sending my comment. I'll re-import again and see what happen. Thanks
The problem was with Google Chrome. I was using Chrome version 48.0.2564.97 (64-bit) on my Mac. Now I switch to Firefox and the import was faster (in less than 2 min)
this practice need also to keep alternative sources for mapping in mind. for example prefix.cc :
:+1: Yes, prefix.cc contains more than vocabularies. And we try to sync LOV with the service. You remind me to send the new file to Richard :)
It could be great to sync to LOV catalogue to add in dydra the default prefixes. Details of the prefixes at http://lov.okfn.org/dataset/lov or http://lov.okfn.org/dataset/lov/context.