ESIPFed / gsoc

Project ideas and mentor guidance for ESIP members to participate in Google Summer of Code.
Apache License 2.0
34 stars 16 forks source link

Implement advanced features in the Community Ontology Repository (COR) #12

Closed lewismc closed 5 years ago

lewismc commented 5 years ago

Mentors

Lewis McGibbney, SemTech Committee Carlos Rueda, SemTech Committee

Information for Students

Please see general ESIP guidelines...

Project Ideas

Idea Title

Implement advanced features in the Community Ontology Repository (COR)

Abstract

COR enables data publishers to publish structured data which can be interlinked and become more useful. As of early 2019 COR is mostly being used as a repository for ontologies and other vocabularies. Limited information exists detailing additional uses therefore users do not know what they can do with the COR and hence do not understand how it could be useful for them. There is much more the ESIP Semantic Technologies Committee (STC) could be doing with COR to promote and enable more widespread use of vocabularies and ontologies in knowledge-based systems. This project will make progress on the following features

Technical Details

COR enables data publishers to publish structured data which can be interlinked and become more useful. As of early 2019 COR is mostly being used as a repository for ontologies and other vocabularies. Limited information exists detailing additional uses therefore users do not know what they can do with the COR and hence do not understand how it could be useful for them. There is much more the ESIP Semantic Technologies Committee (STC) could be doing with COR to promote and enable more widespread use of vocabularies and ontologies in knowledge-based systems.

COR has a number of feature which are currently not exposed, have not been investigated or leveraged and are hence unused. COR utilizes AllegroGraph which is a modern, high-performance, persistent graph database. As an example of advanced functionality, AllegroGraph provides the broadest array of mechanisms to query and access knowledge in an RDF datastore:

RDFS++ Reasoning - Dynamic Materialization Description logics or OWL-DL reasoners are good at handling complex ontologies. They tend to be complete (give all the possible answers to a query) but can be totally unpredictable with respect to execution time when the number of triples increases beyond millions. AllegroGraph offers a very fast and practical RDFS++ reasoner.

AllegroGraph supports all the RDF and RDFS predicates and some in full OWL. The supported predicates are RDF:type, RDFS:subClassOf, range, domain, subProperty.

OWL:sameAs inverseOf, TransitiveProperty, hasValue, someValuesFrom, allValuesFrom, one of, equivalentClass, restriction, onProperty, intersectionOf.

AllegroGraph's RDFS++ engine dynamically maintains the ontological entailments required for reasoning: it has no explicit materialization phase. Materialization is the pre-computation and storage of inferred triples so that future queries run more efficiently. The central problem with materialization is its maintenance: changes to the triple-store's ontology or facts usually change the set of inferred triples. In static materialization, any change in the store requires complete re-processing before new queries can run. AllegroGraph's Dynamic Materialization simplifies store maintenance and reduces the time required between data changes and querying.

OWL2 RL Materialized Reasoner AllegroGraph's OWL2 RL materializer uses a set of inference rules to generate new triples and adds them to the database. OWL 2 RL is the subset of OWL 2 that is designed to support rule based reasoners. OWL 2 RL contains a large number of rules for generating triples and some rules for verifying that the triple store is consistent with respect to the OWL 2 RL ontology. The OWL2 RL materializer is best when OWL 2 RL inference is required or the store is relatively static.

SPARQL Queries on Named Graphs SPARQL, the W3C standard RDF query language, returns RDF, XML and other formats in responses to queries. AllegroGraph's SPARQL, one of the W3C's "interoperable implementations", includes a query optimizer, and has full support for named graphs. It can be used with the RDFS++ reasoning turned on (i.e., query over real and inferred triples). SPARQL can be used with every available AllegroGraph interface mentioned in the previous section.

Prolog AllegroGraph's RDF Prolog provides concise, powerful, industry-standard, domain-specific reasoning to build high-level concepts (that require complex rules or numerical processing) on top of RDF data. AllegroGraph Prolog is an option because many use cases are difficult (or very cumbersome) to model with only RDF/RDFS and OWL. Prolog can also be used on top of the RDFS++ reasoner as a rule based system.

Low-level APIs Allow fast, 'close-to-the-metal' access to triples by subject, predicate, and object.

This project will advance COR such that the above features are available in the following forms

Helpful Experience

Students should be interested in semantic technologies and should be familiar with the Resource Description Framework (RDF), Linked Data and the semantic web/technologies ecosystem. This will really help you to get a grip of the project early on. Students should also be prepared to code in Python, Lisp and possibly Scala. Proficiency is not expected but you will be asked to touch these languages during the course of the summer. Don't worry, if you are not particularly confident, your mentors will assist you.

First steps

Please indicate your interest below. It will also be advantageous for you to introduce yourself to the ESIP Semantic Technologies Committee on our community mailing list.

ayush-anand-13 commented 5 years ago

Sir, I'd like to join the organization for this project. I'd love to know how to start contributing in a meaningful manner.

lewismc commented 5 years ago

@maykillmore thanks for your response. Can you please email me at lewis.mcgibbney [at] gmail [dot] com and we can start discussions. Thank you

lewismc commented 5 years ago

@carueda I would like to be able to utilize spatial and temporal extensions to SPARQL e.g. stSPARQL. These extensions are not currently available in AllegroGraph so I am thinking we should implement a new datastore for ORR which supports these extensions. Specifically, Strabon does provide both spatial and temporal extensions to SPARQL so I think we could support it. Let's discuss when we next tag up. Thanks

graybeal commented 5 years ago

I’d like to sound a note of caution here. I expect Implementing a new data store is non-trivial, and should not be taken lightly. (Triply so if you are implying replacing Allegrograph.) Is it clear the same goals can not be achieved with Allegrograph, have you talked to them?


From: Lewis John McGibbney notifications@github.com Sent: Sunday, February 3, 2019 21:37 To: ESIPFed/gsoc Cc: Subscribed Subject: Re: [ESIPFed/gsoc] Implement advanced features in the Community Ontology Repository (COR) (#12)

@caruedahttps://github.com/carueda I would like to be able to utilize spatial and temporal extensions to SPARQL e.g. stSPARQL. These extensions are not currently available in AllegroGraph so I am thinking we should implement a new datastore for ORR which supports these extensions. Specifically, Strabon does provide both spatial and temporal extensions to SPARQL so I think we could support it. Let's discuss when we next tag up. Thanks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/ESIPFed/gsoc/issues/12#issuecomment-460136439, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABNU0HaKJrpEGkJGHlQN2IoTM9iy-Lerks5vJ8cOgaJpZM4addnl.

lewismc commented 5 years ago

@carueda if we have a PoC at AllegroGraph then it would be good to get in touch with them.

carueda commented 5 years ago

@lewismc PoC provided offline.

As we recently discussed, besides of course exploring support from AllegroGraph itself for this extra functionality, it would also be convenient to revisit investigating alternatives, in particular open-source ones. Jena SDB/TBD and Virtuoso were among the ones I explored long time ago: https://github.com/mmisw/mmiorr/tree/master/org.mmisw.ont/src/org/mmisw/ont/triplestore

Use a different triple store service?

Although certainly not trivial, integration of a different triple store service should not be too complicated either. The system is designed to rely on an interface (TripleStoreService) for all triple-store related operations (along with exposing the SPARQL endpoint on the underlying service at a convenient location). So, it would be basically just a matter of providing a new implementation for such trait.

Regarding the current AllegroGraph based implementation it is worth mentioning that this is solely based on the REST interface provided by the AllegroGraph server, that is, not using any AllegroGraph client library per se. This strategy has proved very beneficial not only in terms of less coupling at the dependency level, but especially in terms of easier handling of possible compatibility issues upon new versions of the service. A similar strategy can be considered for an alternative triple store server.

lewismc commented 5 years ago

@maykillmore have you had any further thoughts about the project?

ayush-anand-13 commented 5 years ago

@lewismc Sir, I'm ready to get started and would like to get further instructions as to how to proceed. I've gathered a basic understanding of semantic technologies and am currently reading allegrograph documentation. What else should I do?

lewismc commented 5 years ago

@maykillmore until the GSoC project gets started please start working on these issues if you need more guidance or want to get started with the GUI then please see these issues

kushagragpt99 commented 5 years ago

@lewismc Sir, I am Kushagra Gupta, a sophomore at Indian Institute of Technology Kanpur pursuing B.S. in Mathematics and Scientific Computing. I am interested in the field of Machine Learning and Data Analytics and have developed my skills for the same. I am comfortable with C++, Matlab, Go, Python and Python based machine learning and scientific computing libraries like Tensorflow, numPy, Matplotlib, Keras. I have worked on several ML and Data Science projects in the past, which includes building autoencoders, image segmentation using U-Nets, fast Matrix factorization etc. A link to the reports of some of them can be found on https://drive.google.com/drive/folders/1CJiw3RuXt0yCgAJ1aJcy4NDU8SXFanfU?usp=sharing. I request you to guide me through the next steps. I would be highly obliged.

kunakl07 commented 5 years ago

Sir,Is this project taken? I would like to work on this project

lewismc commented 5 years ago

@kushagragpt99 thank you for your interest in the project

Sir,Is this project taken?

No not yet no decisions have been made. If you are interested, you will be asked to provide a proposal. In the meantime please feel free to start coding some open issues at either of the following https://github.com/mmisw/orr-ont/ https://github.com/mmisw/orr-portal

meharbhatia commented 5 years ago

Sir, I am interested and would like to join the organization for this project. I'd love to know how to start contributing in a meaningful manner.

lewismc commented 5 years ago

@meharbhatia thanks for registering your interest. Have you also taken a look at https://github.com/ESIPFed/gsoc/issues/18 ?

lewismc commented 5 years ago

@meharbhatia ping

@kushagragpt99 @kunakl07 @maykillmore how are you guys getting on here? Have any of your started looking at the referenced code with the goal of successfully having a local deployment? Thanks

lewismc commented 5 years ago

@meharbhatia @kushagragpt99 @kunakl07 @maykillmore PING

kunakl07 commented 5 years ago

Hi @lewismc , Working with AllegroGraph's on Jupyter Notebook. Here,we take restaurants from the East Part of San Francisco and find longitude and latitude,creating n-dimensional capabilities,and when we click a part on the map,it would shows all the resturants that are in particular region bandicam 2019-03-22 23-03-50-128 Sorry sir,for replying late and What would be the next steps that,I should take?

lewismc commented 5 years ago

@kunakl07 that's pretty cool. What does the source data look like? How did you get it into AllegroGraph? Is it some RDF serialization?

What would be the next steps that,I should take?

I would stronly encourage you to start looking at issues in the ORR Ont Backend System

esip-lab commented 5 years ago

Hi all - a friendly reminder that there is ONE WEEK LEFT to submit your proposals for this project! Best of luck and we're excited to see what is submitted!

lewismc commented 5 years ago

Hi @maykillmore, @kushagragpt99, @kunakl07 and @meharbhatia, no-one has shared a proposal here or with me directly yet. The final date for this is tomorrow. If you would like to be considered then you MUST submit a proposal by tomorrow this is mandated by Google. Thank you

lewismc commented 5 years ago

Thank you all for your proposals and your interest in ESIP. Unfortunately this project was not selected this year.