Closed Mec-iS closed 6 years ago
If I may, I would recommend using rdflib, there is also Apache Jena which is good, however they do not provide a Python API. Rdflib has a learning curve, but all in all it is much more powerful than any other tools that I have used. For the stack as well, would it be good to use Django? It is pretty robust and would help people focus more on the actual aim of the project rather than on the intricacies of the stack itself. Please let me know if my suggestions were useful. :)
There is a cool graph database engine called OrientDB https://github.com/orientechnologies/orientdb They also have a python client though
A few comments about https://github.com/pchampin/hydra-py @Mec-iS it is a client implementation, not a server implementation @chrizandr it is based on rdflib
@Mec-iS suggeseted that I post this here for better feedback
I have been going through the Hydra Proposal for the last couple of days and have been trying to understand what both Hydrus and Hydra is.
Here is all that I have understood:
Hydra is an XML namespace that uses RDF and provides tags to define Linked Data in a Graph Database. Hydra also allows us to define API Documentation along with other types of data, that would allow us to expose server APIs for clients to exchange data with servers without the need of clients to use only hyperlinks. The main aim of the Hydra is to have a semantic understanding of what the content of each Hyperlink is, allowing clients to deference suitable hyperlinks and ignoring unnecessary ones. Hydra also allows HTTP operations directly between the client and server.
Hydrus is a python based web app, that is used to demonstrate the capabilities of Hydra using a Space Exploration example. The general work flow of the app (from my understanding) would be:
This is similar to the example given on github, where a user requests the distance between Mars and Earth.
Please let me know if what I have understood is correct or not.
If possible, I would also like to know what more I can do to work/start working on this project for GSoC.
@chrizandr, you wrote
Hydra is an XML namespace
that's not stricly correct, as Hydra has no (direct) relation to XML. Hydra is a vocabulary, i.e. a set of IRIs (in RDF/Linked Data, we use IRIs to identify any thing of interest: classes, attributes and relations, instances, datatypes...).
allowing clients to deference suitable hyperlinks
More generally, Hydra describes the available HTTP operations and what they mean/do. Dereferencing a hyperlink is just one particular HTTP operation (GET), although indeed it is the most common.
But overall, I think you git it right :)
@chrizandr
The general work flow of the app (from my understanding) would be:
- Users give textual input in their natural language.
- Hydrus processes this input and matches them to relevant classes and operations using NLP and Machine Learning.
- Hydrus gets the API for the relevant operation from the server.
- Asks the server to perform the operations on the relevant classes and directly give the output to the user.
There are different possible designs that I am proposing. I would like to discuss with you all students and mentors which one is the most interesting and viable:
I would like to have your opinion about which one (or both?) can be the best one to fully express and test Hydra-features. I would be happy to see both working but if I need to choose I would say no.2 because its domain is much more defined and limited, that is a good thing for a test.
@chrizandr
If possible, I would also like to know what more I can do to work/start working on this project for GSOC
Keep studying the spec and asking questions (here or writing to the public HYDRA list public-hydra@w3.org introducing yourself). Start studying how Levanzo implemented a server in Clojure to serve an HYDRA API, take as much inspiration as possible to develop a similar server in Python. Then, you can fork the repository and start coding. You PR will be reviewed and commented and finally merged.
As the first students expressed interest in the project, I write here some more insights about the things at this very early stage.
The objective for this project is to create a demo Web API implementing the HYDRA draft, that is an RDF-based framework. The entities defined in the specs are meant to describe the structure and usage of a generic Web API, to let an HYDRA-enabled ("intelligent" or "smart") client to connect to the API's entrypoint and automatically find out where and how to find the needed data.
In this scenario the layers involved are: A. HYDRA server that can serve data and metadata to a client (this layer can be split into a traditional lower level server relying on a graph database plus a "HYDRA middleware"), B. client that can "understand" HYDRA metadata and connect to HYDRA-enabled services, and possibly "learn and remember" about past interactions with other services to store its own set of concepts to be used in the usage's domain.
The objective is generally to let different HYDRA-enabled clients to exchange data each other. These clients can be running on any kind of machine, but the focus for this automation are IoT (connected) devices (industrial or consumer or research). Usage scenario:
Different concepts and classes are involved. An RDF domain has to be defined for the metadata exchange to work.
To make the demo interesting I suggested we should leverage space exploration and astronomy, so the graph can be based on these vocabularies: https://github.com/chronos-pramantha/RDFvocab/blob/master/ld%2Bjson/Spacecraft.json I can suggest possible operations to be requested to the API. Resources are well connected to popular repositories, so we can reach a great amount of knowledge without storing too much.
A very good starting design for the server is https://github.com/antoniogarrote/levanzo A Python implementation for the
serverclient: https://github.com/pchampin/hydra-pyPlease enlist questions and comments below.
Some resources:
PS. the stack to be used has to be decided yet. We should tend to use a full-Python implementation, except for the lower layer where a graph database is required, we are free to experiment so we can suggest anything in the beginning. At first impression I would avoid Triple Stores and try to use a Graph Database or try to prototype something with Apache TinkerPop or also Spark GraphX, to gain in flexibility to switch to different solutions. At first I would prefer to not get concerned into stability and scalability but just try to reach the first working tool, to let the things to be iterated.
UPDATE: To have a better insight, one of the proposed design is described at #3
UPDATE: There are different possible designs that I am proposing. I would like to discuss with you all students and mentors which one is the most interesting and viable:
UPDATE: Gitter chat available here
UPDATE: Check also this architectural proposal