bio4j / exporter

GSoC 2014 project
GNU Affero General Public License v3.0
3 stars 3 forks source link

Gremlin user defined steps #2

Closed pablopareja closed 10 years ago

pablopareja commented 10 years ago

The use of Gremlin user defined steps would be strongly advisable in order to encapsulate sub-queries that could be used in different queries. Not only that, we should elaborate a list of such possible pre-defined steps so that we would end up having a sort of preliminary bio4j-gremlin-method-library.

andr3nun3s commented 10 years ago

Sounds good because it allows us to work on a higher abstraction level.

I would appreciate a list of pre-defined steps for me to follow and implement.

pablopareja commented 10 years ago

OK, I will elaborate that list for you. In the mean time please have a look and start making some tests with gremlin and all these questions.

pablopareja commented 10 years ago

Hi André, did you start making the first tests with Gremlin?

andr3nun3s commented 10 years ago

Hi, I'm sorry but I've been busy with school work. I have my degree's final project to deliver on the 14th so I'll be a bit busy until then.

I've installed Gremlin and did some examples, is there any specific thing I should test?

Thanks

pablopareja commented 10 years ago

I guess you already had a look at it but just in case you didn't here is the domain model for Bio4j:

https://raw.githubusercontent.com/bio4j/bio4j/master/docs/resources/images/Bio4jDomainModelWithCardinality.jpg

Cardinality of relationships is included

A preliminary list of user defined steps for proteins could be:

andr3nun3s commented 10 years ago

Could you give me a practical example?

Thanks for your patience.

andr3nun3s commented 10 years ago

Please note that I have little biology background and I don't know how most of bio4j works, documentation is very scarce, most links in the examples section are broken so I don't know how to start tackling this project.

All links broken: https://github.com/bio4j/bio4j/blob/master/docs/auxiliary-relationships.md This is helpful: https://github.com/bio4j/bio4j/blob/master/docs/node-indexing.md

Let's take for instance Protein Organism: We can query for a specific Scientific name index or NCBI taxonomy id index.

How should the output look, do we print all information about each protein that fits the query? In the requested format(Gexf/Graphml/GraphSON)?

pablopareja commented 10 years ago

OK so let me try to answer all your questions:

  1. It's true that many links in the examples section don't work. However the only one that's directly related to what you have to do works indeed: https://github.com/bio4j/bio4j/blob/master/docs/bio4j-gremlin-cheat-sheet.md

Keep in mind that in principle you won't be interacting with the database through the Java API but rather with Gremlin queries

  1. Regarding the user defined steps for proteins that I mentioned in the previous message they would all have as starting point a protein accession (or list of protein accessions)
  2. When you mention how the output should look, what do you exactly mean? If you are referring to the user defined steps I think you may have not understood what they're about. There's no XML output file to be exported at this point since steps are rather a way of encapsulating commonly used Gremlin sub-queries into just one keyword.
andr3nun3s commented 10 years ago

OK thanks, then could you give me an example implementation of an user defined step from those you've listed? I can pick it up from there and implement the remaining ones.

laughedelic commented 10 years ago

@andre-nunes what's up here?

andr3nun3s commented 10 years ago

I don't know how to define Gremlin steps in Java, care to demonstrate?

laughedelic commented 10 years ago

I've skimmed through this thread and I see that @pablopareja already answered on some of your questions. Don't hesitate to ask more, if your don't understand answers or something else.

I think, that I won't tell you better how to create a gremlin step than the guide from the gremlin docs and I'll be glad to answer your questions, if you don't understand something there.

andr3nun3s commented 10 years ago

Well the whole guide is geared towards Groovy so I don't know how to do it in Java.

Here's an user defined step:

Gremlin.defineStep('co',[Vertex,Pipe], 
 {String label -> _().as('x').out(label).in(label).except('x')}) 

In Java all I could find is this method:

Gremlin.defineStep(String arg0, List<Class> arg1, Closure arg2)

How do I represent [Vertex, Pipe] as a List<Class>?

laughedelic commented 10 years ago

Look, there is something relevant in gremlin docs: Using Gremlin through Java (+ this). Maybe something else, so I recommend you to search gremlin docs carefully (and just google particular gremlin/java things).

In general, groovy mostly just brings some syntax sugar for java, so google for it and try to interpret groovy constructions in java (and also try to use groovy in the gremlin REPL, why not? :wink:)

Regarding your last question: as the type says, it's just a list of classes, so I guess, it will be something like List(Vertex.class, Pipe.class) in Java

andr3nun3s commented 10 years ago

My first try at this can be found here: https://github.com/bio4j/exporter/commit/919c9de070dd4bcb9b2d6b4c698d832c1a1656ca

It's still work in progress, it's not tested and I'm wondering if I could get a more detailed explanation of what each User Defined Step should do, at the moment they just iterate the vertices with a given relationship.

More info on the daily issue: https://github.com/bio4j/exporter/issues/18

Thanks

andr3nun3s commented 10 years ago

Is this or this a possibility? At least to me it seems easier to define the Steps/Queries in Groovy and then use them in Java through these methods, to be honest this concept of Pipes has been hard to grasp, maybe I need more practice.

pablopareja commented 10 years ago

Hi André,

As stated in the project proposal and the rest of the documentation the language to be used is Java (not Groovy or anything similar) Anyways I don't see why so much time can be needed to define a gremlin step when it's explained pretty clearly in the documentation how it should be done: https://github.com/tinkerpop/gremlin/wiki/User-Defined-Steps In fact, you can find this link on the second result when googling defining gremlin user steps in java

andr3nun3s commented 10 years ago

That's all in Groovy, no mention of Java, that's my problem.

I asked on Gremlin-users mailing-list and it's impossible to have named user-defined steps in Java: https://groups.google.com/forum/#!topic/gremlin-users/MaDb0Y1RqZo

I'll have to look into another approach, possibly using DSLs

andr3nun3s commented 10 years ago

Is the usage of Tinkerpop3 out of the question? Looks easier to use in Java.

laughedelic commented 10 years ago

It's a bit strange. They were going to release it in August. Have they done it already??

andr3nun3s commented 10 years ago

Seems like it, in the meantime I've finally figured out how to use GremlinPipeline in Java so I've done some progress.

My question now is: What's will be the purpose of these steps in the context of the exporter? Will these be the queries that users can call through the CLI? If so I guess they don't care how this works in the background, at the moment these "steps" return a list with the iterated vertices, let me know explicitly what should be happening in the background.

andr3nun3s commented 10 years ago

I guess the user-defined steps should return a GremlinPipeline instance so that we can continue appending pipes... :smile:

The thing is that they won't be callable from Gremlin REPL unless we use Groovy, which seems to be the point or am I wrong?