Bio4j test database including Gene Ontology module

pablopareja commented 10 years ago

Hi,

I just uploaded a zip file called bio4j.zip that includes a database instance where you can already perform any tests you need at this stage of the process. Just keep in mind that it only includes the Gene Ontology module

Here's the commit: 37c8d7a3bbf42c521d6220056ad8ebcb402160f4

laughedelic commented 10 years ago

Oh, cool! :+1:

andr3nun3s commented 10 years ago

Thanks, but how are these files used?

laughedelic commented 10 years ago

@pablopareja an important thing to say was that it's a TitanDB :wink: (I spent some time struggling to do something with Neo4j...). And where did you get if from? Is it an old version or the one based on the typed-graphs?

@andre-nunes first steps:

download this bio4j.zip and unpack somewhere
download Titan with gremlin REPL

run the REPL and do

gremlin> g = TitanFactory.open("<your path to the unpacked bio4j>")

now you can try something from the gremlin api, like
```
gremlin> g.v(2340).map
gremlin> g.v(2340).outE
```

but there is something strange. @pablopareja properties have strange names, like com.bio4j.titan.model.go.nodes.TitanGoTerm$TitanGoTermType@7722c3c3.name

If it's from the new version, it's relevant to mention @eparejatobes here :wave:

laughedelic commented 10 years ago

P.S. @pablopareja could you upload it to S3 and just give somewhere here a link? It's not good to store big binaries in the repository...

andr3nun3s commented 10 years ago

Let me see if I got this right, I've been looking into Bio4j Explorer to get a better idea of all the nodes and relationships.

The vertices of this graph are ProteinNode?

So if I want outer edges with a negatively-regulates relationship how do I express that?

g.V.outE('negatively-regulates') doesn't work.

Or do I have to use those weird names @laughedelic mentioned?

Thanks!

laughedelic commented 10 years ago

Why ProteinNode? It's from UniprotKB, isn't it? In GO it should be GoTerm nodes and a couple of others. And weird names is something not normal, it will be fixed.

pablopareja commented 10 years ago

Hey,

Yeah, I forgot to mention that it was Titan DB, anyways just having a quick look at the files included in the folder it's trivial to figure out that it's not Neo4j but Titan :wink:

Those so called weird names are the ones agreed so far for the typed graphs version.

This is indeed the first prototype of the brand new shiny version of Bio4j + typedgraphs! :smiley:

About uploading it to S3, if you tell me a bucket where I should put it I can do it no problem. Let's see what @eparejatobes thinks about this...

eparejatobes commented 10 years ago

This as non-official as it gets, so put it anywhere, I don't mind

pablopareja commented 10 years ago

OK, and why not simply leaving it here then? it only weights ~17 MB after all...

laughedelic commented 10 years ago

Look, guys, I think there is some misunderstanding. I don't care about the form of the names, but I care whether they are usable or not. If you look at this: com.bio4j.titan.model.go.nodes.TitanGoTerm$TitanGoTermType@7722c3c3.name you can notice an ugly part @7722c3c3 which I see as a problem (because I think it's just some random number generated by jvm). Please, correct me if I'm wrong, but so far I don't see, how to use this.

eparejatobes commented 10 years ago

Fine for me about leaving it here

eparejatobes commented 10 years ago

Re names that looks like a wrong toString somewhere

laughedelic commented 10 years ago

Ok, I've uploaded that file to https://s3-eu-west-1.amazonaws.com/bio4j.snapshots/bio4j.zip and removed from here.

eparejatobes commented 10 years ago

Fine for me

laughedelic commented 10 years ago

btw, @andre-nunes you don't need this to write gremlin steps in #2.

eparejatobes commented 10 years ago

Re names that looks like a wrong toString somewhere

laughedelic commented 10 years ago

Re your message looks like some secret signal to someone... :smile:

eparejatobes commented 10 years ago

To myself mostly :)

andr3nun3s commented 10 years ago

@laughedelic , I know I don't need this but so far it's the only way I have to test the gremlin steps I create, I like trial and error...

laughedelic commented 10 years ago

@andre-nunes it's not the only way, just look at the gremlin tutorial, do g = TinkerGraphFactory.createTinkerGraph() and you have some predefined graph for your trials and errors, train to make steps on that graph and creating a similar thing for bio4j won't be a problem for you.

laughedelic commented 10 years ago

hey @andre-nunes! try the new snapshot of the bio4j, @pablopareja has added it here again.

gremlin> g = TitanFactory.open("bio4j_tests_db/bio4j")
gremlin> g.v(2340)['com.bio4j.titan.model.go.nodes.TitanGoTerm.TitanGoTermType.name']
==>inositol heptakisphosphate 4-kinase activity
gremlin> g.v(2340).out('com.bio4j.titan.model.go.relationships.TitanSubOntology.TitanSubOntologyType').map
==>{com.bio4j.titan.model.go.nodes.TitanSubOntologies.TitanSubOntologiesType.name=molecular_function}

It works fine :ok_hand: Don't forget to enclose those long names in quotes

andr3nun3s commented 10 years ago

Hey,

This is a Titan DB, what types of DBs is the exporter supposed to support? I've assumed that the user inputs the address for the data he wants to query on and the exporter will load the data from the address.

My question is: How can we find out what type of data the user is trying to load? (from the address alone) The methods to load the graph depend on the data (titan, rexter, CSV, GraphSON, etc). Should we ask what type of data the user wants to load?

laughedelic commented 10 years ago

I don't think that all this is important now. You can think that the exporter takes a fixed TitanDB instance of the Bio4j and works with it. Before thinking of different options from where to take data you should have at least one option working.

andr3nun3s commented 10 years ago

Ok, I'll focus on getting it to work with TitanDB first.

andr3nun3s commented 10 years ago

Anyone knows how to load a TitanGraph in Java? In the gremlin REPL:

  gremlin> g = TitanFactory.open("<your path to the unpacked bio4j>")

In Java that doesn't work, it throws an exception stating that I need to specify a readable configuration file. Documentation on this.

Do I have to use yet another library or is there a better way?

laughedelic commented 10 years ago

no, you can do it without configuration. it's written in documentation in a lot of places, it's in api docs in the end. try harder, search documentation better, google what you want to do.

bio4j / exporter

Bio4j test database including Gene Ontology module #15