General Starting Introduction To The Project

Mec-iS commented 7 years ago

As the first students expressed interest in the project, I write here some more insights about the things at this very early stage.

The objective for this project is to create a demo Web API implementing the HYDRA draft, that is an RDF-based framework. The entities defined in the specs are meant to describe the structure and usage of a generic Web API, to let an HYDRA-enabled ("intelligent" or "smart") client to connect to the API's entrypoint and automatically find out where and how to find the needed data.

In this scenario the layers involved are: A. HYDRA server that can serve data and metadata to a client (this layer can be split into a traditional lower level server relying on a graph database plus a "HYDRA middleware"), B. client that can "understand" HYDRA metadata and connect to HYDRA-enabled services, and possibly "learn and remember" about past interactions with other services to store its own set of concepts to be used in the usage's domain.

The objective is generally to let different HYDRA-enabled clients to exchange data each other. These clients can be running on any kind of machine, but the focus for this automation are IoT (connected) devices (industrial or consumer or research). Usage scenario:

the IoT client X needs to know at which TIME the OPERATION Y was performed by the DEVICE Z. Its starting knowledge it is only about the API entrypoint's URI.
the client X fetches the metadata from the entrypoint, it finds out that to get TIME FOR Y ON Z it needs to request the endpoint http://entrypoint/gettime with method GET and passing Y, Z as parameters
the client makes the request to pull the data

Different concepts and classes are involved. An RDF domain has to be defined for the metadata exchange to work.

To make the demo interesting I suggested we should leverage space exploration and astronomy, so the graph can be based on these vocabularies: https://github.com/chronos-pramantha/RDFvocab/blob/master/ld%2Bjson/Spacecraft.json I can suggest possible operations to be requested to the API. Resources are well connected to popular repositories, so we can reach a great amount of knowledge without storing too much.

A very good starting design for the server is https://github.com/antoniogarrote/levanzo A Python implementation for the ~~server~~ client: https://github.com/pchampin/hydra-py

Please enlist questions and comments below.

Some resources:

a blog post about documenting an Web API and programming a client

PS. the stack to be used has to be decided yet. We should tend to use a full-Python implementation, except for the lower layer where a graph database is required, we are free to experiment so we can suggest anything in the beginning. At first impression I would avoid Triple Stores and try to use a Graph Database or try to prototype something with Apache TinkerPop or also Spark GraphX, to gain in flexibility to switch to different solutions. At first I would prefer to not get concerned into stability and scalability but just try to reach the first working tool, to let the things to be iterated.

UPDATE: To have a better insight, one of the proposed design is described at #3

UPDATE: There are different possible designs that I am proposing. I would like to discuss with you all students and mentors which one is the most interesting and viable:

(Astronomy-based) the one you have written about is an idea coming form the quite recent development brought by planetary science about exoplanets. If you have a list of star systems with some planets observed, you can have your REST API to create the observed star and the observed planets orbiting that star. This implementation uses the Astronomy vocabulary.
(Engineering-based) the one described in #3 is instead the first idea I had and it is about designing simulated spacecraft spare parts (Cubesat's COTS) and serve these parts using a REST API. In this case the user could create his/her own parts and put them together (with physical constraints applied) to build its own spacecraft. This implementation uses the Spacecraft and SubSystems vocabulary.
(NLP based) an idea about a semantic engine that can translate human questions about the Solar System into query for the no.1 above and reply consistently; i.e. "How bigger is Jupiter compared to Earth?" from the user, and the server/client able to reply "Earth has a mass of 5.97 × 10^24 kg. Jupiter has a mass of 1.8986×10^27 kg".

UPDATE: Gitter chat available here

UPDATE: Check also this architectural proposal

chrizandr commented 7 years ago

If I may, I would recommend using rdflib, there is also Apache Jena which is good, however they do not provide a Python API. Rdflib has a learning curve, but all in all it is much more powerful than any other tools that I have used. For the stack as well, would it be good to use Django? It is pretty robust and would help people focus more on the actual aim of the project rather than on the intricacies of the stack itself. Please let me know if my suggestions were useful. :)

kkoci commented 7 years ago

There is a cool graph database engine called OrientDB https://github.com/orientechnologies/orientdb They also have a python client though

pchampin commented 7 years ago

A few comments about https://github.com/pchampin/hydra-py @Mec-iS it is a client implementation, not a server implementation @chrizandr it is based on rdflib

chrizandr commented 7 years ago

@Mec-iS suggeseted that I post this here for better feedback

I have been going through the Hydra Proposal for the last couple of days and have been trying to understand what both Hydrus and Hydra is.

Here is all that I have understood:

Hydra is an XML namespace that uses RDF and provides tags to define Linked Data in a Graph Database. Hydra also allows us to define API Documentation along with other types of data, that would allow us to expose server APIs for clients to exchange data with servers without the need of clients to use only hyperlinks. The main aim of the Hydra is to have a semantic understanding of what the content of each Hyperlink is, allowing clients to deference suitable hyperlinks and ignoring unnecessary ones. Hydra also allows HTTP operations directly between the client and server.

Hydrus is a python based web app, that is used to demonstrate the capabilities of Hydra using a Space Exploration example. The general work flow of the app (from my understanding) would be:

Users give textual input in their natural language.
Hydrus processes this input and matches them to relevant classes and operations using NLP and Machine Learning.
Hydrus gets the API for the relevant operation from the server.
Asks the server to perform the operations on the relevant classes and directly give the output to the user.

This is similar to the example given on github, where a user requests the distance between Mars and Earth.

Please let me know if what I have understood is correct or not.

If possible, I would also like to know what more I can do to work/start working on this project for GSoC.

pchampin commented 7 years ago

@chrizandr, you wrote

Hydra is an XML namespace

that's not stricly correct, as Hydra has no (direct) relation to XML. Hydra is a vocabulary, i.e. a set of IRIs (in RDF/Linked Data, we use IRIs to identify any thing of interest: classes, attributes and relations, instances, datatypes...).

allowing clients to deference suitable hyperlinks

More generally, Hydra describes the available HTTP operations and what they mean/do. Dereferencing a hyperlink is just one particular HTTP operation (GET), although indeed it is the most common.

But overall, I think you git it right :)

Mec-iS commented 7 years ago

@chrizandr

The general work flow of the app (from my understanding) would be:

Users give textual input in their natural language.

Hydrus processes this input and matches them to relevant classes and operations using NLP and Machine Learning.

Hydrus gets the API for the relevant operation from the server.

Asks the server to perform the operations on the relevant classes and directly give the output to the user.

There are different possible designs that I am proposing. I would like to discuss with you all students and mentors which one is the most interesting and viable:

(Astronomy-based) the one you have written about is an idea coming form the quite recent development brought by planetary science about exoplanets. If you have a list of star systems with some planets observed, you can have your REST API to create the observed star and the observed planets orbiting that star. This implementation uses the Astronomy vocabulary.
(Engineering-based) the one described in #3 is instead the first idea I had and it is about designing simulated spacecraft spare parts (Cubesat's COTS) and serve these parts using a REST API. In this case the user could create his/her own parts and put them together (with physical constraints applied) to build its own spacecraft. This implementation uses the Spacecraft and SubSystems vocabulary.

I would like to have your opinion about which one (or both?) can be the best one to fully express and test Hydra-features. I would be happy to see both working but if I need to choose I would say no.2 because its domain is much more defined and limited, that is a good thing for a test.

Mec-iS commented 7 years ago

@chrizandr

If possible, I would also like to know what more I can do to work/start working on this project for GSOC

Keep studying the spec and asking questions (here or writing to the public HYDRA list public-hydra@w3.org introducing yourself). Start studying how Levanzo implemented a server in Clojure to serve an HYDRA API, take as much inspiration as possible to develop a similar server in Python. Then, you can fork the repository and start coding. You PR will be reviewed and commented and finally merged.

HTTP-APIs / hydrus

General Starting Introduction To The Project #2