Letractively / mmisw

Automatically exported from code.google.com/p/mmisw
0 stars 0 forks source link

Please set up periodic "harvest" for updated version of GCOOS ontology #178

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What capability do you want added or improved?
Felimon at GCOOS requests a periodic "harvest" of the updated version of
GCOOS ontology for the MMI repository.  Harvest from: 
http://gcoos.rsmas.miami.edu/dp/srv_gcoos_generateOWL.php

Where do you want this capability to be accessible?
to be automatic

What sort of input/command mechanism do you want?

What is the desired output (content, format, location)?

Other details of your desired capability?

What version of the product are you using?

Please provide any additional information below (particular ontology/ies,
text contents of vocabulary (voc2rdf), operating system, browser/version
(Firefox, Safari, Chrome, IE, etc.), screenshot, etc.)

Original issue reported on code.google.com by steph_wa...@consolidated.net on 15 Sep 2009 at 2:27

GoogleCodeExporter commented 9 years ago
Thanks Stephanie for entering this request.

My initial comment is that automatic harvesting hasn't been addressed yet, and I
think it will take some significant time to implement (especially given the many
other features that still need work for a stable system). But it is something 
being
considered (I'm thinking about the CF and GCMD vocabs in particular, so perhaps 
it is
not too far in the future either.

Here are some questions/comments:

- How often is the GCOOS vocabulary changing?  An initial approach would be to
implement some convenient mechanism to create a new version of the GCOOS vocab 
at ORR
by just giving the source URL (like the one above). Then Felimon can just 
notify us
that a new version has been posted so one of us can run the said mechanism 
(hopefully
a task to complete in just a couple of mins)..

2) I see the following xml:base in the ontology obtained from the given URL):
   http://gcoos.rsmas.miami.edu/dp/data/Parameters.owl

Question: should the corresponding ontology in the MMI ORR keep this original
xml:base (in other words, should the MMI ORR "re-host" the ontology), or can the
registered ontology be given an 'http://mmisw.org/ont/gcoos'-based URI?

(I'm including Felimon and John in the Cc of this issue, in case they can add 
their
comments.)

Thanks.  --carlos

Original comment by caru...@gmail.com on 15 Sep 2009 at 2:55

GoogleCodeExporter commented 9 years ago
Thanks for keeping me on the loop:

True that automatic harvesting will introduce other issues (e.g. mapping of new
inputs; dealing with broken links, etc.) but this needs to addressed if we want 
the
registry to be "current". There are several ways to deal with this and can be
categorized as either: (1) Passive - data contributors will actively push their 
data
onto the repository and update what needs to be updated as edits are made to 
their
collection, or (2) Active - the project provides a utility to establish a 
service
(event-driven) to push the data automatically as a change is made and reminders 
are
emailed for other manual operations (if necessary).

Option (1) will very likely not work given the prevailing working environment 
and
available resources; i.e. very few people have time to even read, less edit, the
collection. This answers the first query: How often is the vocabulary changing?
Answer is seldom to date and I do not foresee a frequency. If a user or data 
provider
chances upon the collection and discovers a problem (no definition, wrong 
definition,
insufficient) the norm is to email to whoever manages the collection. In some 
cases,
they are circulated among regional members and comments are received and 
consolidated
-- then an edit is made to the collection.

The most ideal scenario is to submit the collection to the group for an annual 
review and amendments are made once a year. Simple but most regional 
associations (my
guess) has yet to get to this rhythm as the need to keep their vocabulary 
current and
'sufficient' seems not well understood and/or appreciated. 

I suspect that once an application of this ontology/repository works, this 
'need'
will surface -- the utility of this ontology has not reach a level that groups 
will
give importance to their collections.

Simply, I think, providing data providers with ready tools to do most of the 
job for
them (i.e. automated harvesting) will be most attractive and can be sustainable 
in
the long term.

With regards to the query on the xml:base, I suggest that MMI ORR keep the base 
as
provided by the data provider.

Original comment by felimon....@gmail.com on 15 Sep 2009 at 11:19

GoogleCodeExporter commented 9 years ago
I concur with Felimon that we need to provide the facility -- the need will 
become obvious as people start 
really using the system. I think we'll want two modes for option 2: event-based 
and polled by schedule. 
Event-based will kick off based on some explicit notification from the target 
system, perhaps a URL-formed 
command that says 'update this ontology' (with the one at the base location 
which has been previously 
defined).  

Polled will be needed for people who can't, or aren't inclined to, send the 
repository a message.  In this mode 
content at the source will be compared to content internally, in some form TBD. 
Changes drive a new update. 
Polling cycles will need to be set separately for different ontologies; someday 
the frequency can be modified 
automatically to reflect recent update rates, but probably should never go 
below about daily.

Regarding xml:base, I think it depends on what the provider is trying to 
achieve. (Some just inherit a base but 
don't really care about keeping it.)   But we certainly need to be prepared to 
keep the original base, as an 
option. (That is essentially equivalent to indexing an ontology.) 

The obvious disadvantage to this "keep the base" approach is that it won't help 
the ontology term URIs be 
resolved, because the base isn't in MMI's domain namespace. Our preference 
would therefore be for people to 
use our base, except when there is a clear reason not to. (Any clear reason 
will do, we don't want to pass 
judgment or anything.)

Original comment by grayb...@marinemetadata.org on 16 Sep 2009 at 5:18

GoogleCodeExporter commented 9 years ago
Direct registration of ontologies (and their new versions) was implemented. 
(This
work was done especially in the context of the OOI Semantic Prototype).

Please see:

http://ci.oceanobservatories.org/spaces/display/CIDev/Direct+registration+of+RDF
+contents
   http://code.google.com/p/mmisw/source/browse/#svn/trunk/mmiorr-client-demo
This client demonstrates how to programatically perform registration (and 
retrieval)
operations against the ORR. See the README file there.

Although this does not provide, per se, any automated mechanism for updated 
versions,
it certainly makes it feasible and relatively straightforward.

Your comments are very welcome (again) now.  I think I'd like to close this 
issue (as
fixed) given the direct registration capability mentioned above (and with the
corresponding clarification); but, since this issue is essentially about 
allowing
"periodic, automated version updates" I'd like to have your input. Perhaps we 
could
still close this particular entry and create others for the more concrete 
features
mentioned in this thread. Also, please keep in mind that we are currently
prioritizing entries for a first beta release ("milestone-Beta1"). Please also
suggest on whether this (if not closed) or which of the derived entries should 
marked
with that label.

Thanks and regards. --carlos

Original comment by caru...@gmail.com on 5 Apr 2010 at 1:01

GoogleCodeExporter commented 9 years ago
I think you can consider this issue fixed. I agree that automating it can be 
handled
(if needed) via other means. Thanks for addressing the issue. --- Nonong

Original comment by felimon....@gmail.com on 5 Apr 2010 at 1:58

GoogleCodeExporter commented 9 years ago
Thanks.

Original comment by caru...@gmail.com on 5 Apr 2010 at 4:57