bio4j / bio4j-titan

Titan-specific bio4j implementation
https://github.com/bio4j/bio4j
6 stars 2 forks source link

improving Titan db opening times #63

Closed eparejatobes closed 9 years ago

eparejatobes commented 9 years ago

@pablopareja let's try this

graph.allow-stale-config=false
storage.read-only=true
storage.berkeleydb.cache-percentage=20
storage.transactions=false
eparejatobes commented 9 years ago

adding that to the existing conf

pablopareja commented 9 years ago

ok, I'm gonna test it adding

graph.allow-stale-config=false
storage.berkeleydb.cache-percentage=20

since the other two were already included

pablopareja commented 9 years ago

first of all it's not possible to open the database using the configuration parameter:

storage.read-only=true

I guess this must be unexpected behavior so I posted a message to the users group:

https://groups.google.com/forum/#!topic/aureliusgraphs/GViTo3eN0MI

The respective message posted for the exception regarding the opening of the database is here:

https://groups.google.com/forum/#!topic/aureliusgraphs/9PX0fE68v_Q

I'm performing some tests with the smaller version of Bio4j that includes everything but UniRef + UniProtUniRef + GIIndex which weighs around 500 GB and even though it takes a few seconds it seems to be a reasonable time.

Let's see if someone from the Titan community user group can help us figuring out a way to fix this...

eparejatobes commented 9 years ago

about the read-only thing, probably they're setting berkeley DB to acquire write locks for reads.

pablopareja commented 9 years ago

Still no answer on this... just one guy saying:

"I've never tried to open a graph of that size with berkleydb - I can't say if this is expected behavior or not. anyone else have thoughts on this one?"

:sweat:

so what do we do?

I suggest to leave these two versions by now (the one with ~500GB and the other with ~ 1,2 TB) and terminate the importing instance plus all the resources associated. Yesterday I was quickly explaining the situation to @rtobes and @epareja and it seems that it shouldn't be a problem to have the slower-to-be-opened version including UniRef as long as it works since that kind of queries are not performed on an everyday basis.

WDYT?

eparejatobes commented 9 years ago

agreed.

About the future, it is fairly obvious that we need to push alternative storage systems though.

pablopareja commented 9 years ago

ok cool :+1: I'm about to finish uploading the logs, stats, and so on. I will let you know when everything is shut down :wink:

pablopareja commented 9 years ago

OK so it's finally done! there's no importing machine anymore :open_mouth: everything's been deleted but the two tar files holding the two different versions mentioned. I also uploaded two extra tar.gz files, one with the different stats files and another for the logs.

Hereby I officially declare the cake time has come! :cake: :tada: :balloon: :smile:

rtobes commented 9 years ago

Good!!!

I want cake and to use Bio4j!