Closed pablopareja closed 9 years ago
mmm I don't know. Looks like either the property or the vertex type (or something derived from them) is null. Again it'd be nice to have a test for only type creation
Ok I will write the test first thing tomorrow
I already upload the new test here 60c154789591e0783c95f77493c86ccca38c1ccf Let's see if we can figure out what's going on!
I just added the following lines to the test immediately before the line where the error is thrown:
System.out.println(UniRef100Cluster() == null);
System.out.println(UniRef100Cluster().id == null);
System.out.println(mgmt == null);
System.out.println(this == null);
and they are all printing false so it's not a problem of any of the parameters...
@eparejatobes Do you have any idea why we may be getting a null in angulillos-titan here: https://github.com/bio4j/angulillos-titan/blob/master/src/main/java/com/bio4j/angulillos/titan/TitanTypedVertexIndex.java#L125 :question:
I added a couple of debug lines more having now:
System.out.println(UniRef100Cluster() == null);
System.out.println(UniRef100Cluster().id == null);
System.out.println(mgmt == null);
System.out.println(this == null);
System.out.println(UniRef100Cluster().id.name());
System.out.println(UniRef100Cluster().name());
And this is what I get:
false
false
false
false
com.bio4j.model.uniref.UniRefGraph.UniRef100ClusterType.id
com.bio4j.model.uniref.UniRefGraph.UniRef100ClusterType
So I really don't know where this null value comes from... :confused:
What is null
is UniRef100Cluster().id.elementType()
. I'm looking at it.
Cool, can I help you out with something?
@pablopareja see bio4j/bio4j#82
let's have a short meeting about this as soon as you find some time for it!
After all the work done in the previous weeks we got rid of that issue but now we have a new one. Here's the last exception I got:
SEVERE: Could not execute operation due to backend exception
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:44)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.diskstorage.BackendTransaction.executeRead(BackendTransaction.java:428)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.diskstorage.BackendTransaction.edgeStoreQuery(BackendTransaction.java:253)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.edgeQuery(StandardTitanGraph.java:344)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$7$2.get(StandardTitanTx.java:1054)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$7$2.get(StandardTitanTx.java:1051)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.vertices.CacheVertex.loadRelations(CacheVertex.java:47)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$7.execute(StandardTitanTx.java:1051)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$7.execute(StandardTitanTx.java:1005)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.query.MetricsQueryExecutor$4.apply(MetricsQueryExecutor.java:66)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.query.MetricsQueryExecutor$4.apply(MetricsQueryExecutor.java:63)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.query.MetricsQueryExecutor.runWithMetrics(MetricsQueryExecutor.java:81)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.query.MetricsQueryExecutor.execute(MetricsQueryExecutor.java:63)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.query.QueryProcessor$LimitAdjustingIterator.getNewIterator(QueryProcessor.java:195)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.query.QueryProcessor$LimitAdjustingIterator.getNewIterator(QueryProcessor.java:195)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.query.LimitAdjustingIterator.hasNext(LimitAdjustingIterator.java:54)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:40)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.query.ResultSetIterator.<init>(ResultSetIterator.java:30)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.query.QueryProcessor.iterator(QueryProcessor.java:56)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.vertices.AbstractVertex.getProperty(AbstractVertex.java:136)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.vertices.AbstractVertex.getProperty(AbstractVertex.java:152)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.angulillos.titan.TitanUntypedGraph.getPropertyV(TitanUntypedGraph.java:40)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.angulillos.titan.TitanUntypedGraph.getPropertyV(TitanUntypedGraph.java:14)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.angulillos.TypedGraph.getProperty(TypedGraph.java:79)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.angulillos.TypedVertex.get(TypedVertex.java:70)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.model.uniprot.vertices.Protein.accession(Protein.java:42)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef.importUniProtUniRefFile(ImportUniProtUniRef.java:260)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef.importUniProtUniRef(ImportUniProtUniRef.java:78)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.titan.model.uniprot_uniref.programs.ImportUniProtUniRefTitan.execute(ImportUniProtUniRefTitan.java:40)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.titan.model.uniprot_uniref.programs.ImportUniProtUniRefUsingFolderTitan.execute(ImportUniProtUniRefUsingFolderTitan.java:49)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.ohnosequences.util.ExecuteFromFile.main(ExecuteFromFile.java:66)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.titan.programs.ImportTitanDB.main(ImportTitanDB.java:8)
Jan 30, 2015 2:31:56 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: Could not commit transaction due to exception during persistence
Jan 30, 2015 2:31:56 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx.commit(StandardTitanTx.java:1309)
Jan 30, 2015 2:31:56 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsGraph.commit(TitanBlueprintsGraph.java:60)
Jan 30, 2015 2:31:56 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.angulillos.titan.TitanUntypedGraph.commit(TitanUntypedGraph.java:21)
Jan 30, 2015 2:31:56 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.angulillos.titan.TitanUntypedGraph.commit(TitanUntypedGraph.java:21)
Jan 30, 2015 2:31:56 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef.importUniProtUniRef(ImportUniProtUniRef.java:92)
Jan 30, 2015 2:31:56 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.titan.model.uniprot_uniref.programs.ImportUniProtUniRefTitan.execute(ImportUniProtUniRefTitan.java:40)
Jan 30, 2015 2:32:36 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.titan.model.uniprot_uniref.programs.ImportUniProtUniRefUsingFolderTitan.execute(ImportUniProtUniRefUsingFolderTitan.java:49)
Jan 30, 2015 2:32:36 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.ohnosequences.util.ExecuteFromFile.main(ExecuteFromFile.java:66)
Jan 30, 2015 2:32:36 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.titan.programs.ImportTitanDB.main(ImportTitanDB.java:8)
It looks like something similar that we were getting before. By the way, the error was thrown after updating ~ 10.000.000 proteins, which in turn where imported using chunks of TrEMBL having 500.000 each. I'm going to try changing a bit the parameters for the importing process but I'm not sure that will have any effect at all... :confused:
@eparejatobes any ideas?
I could retrieve some extra information regarding this, even though it was not written to the log files, here it goes:
Caused by: com.sleepycat.je.EnvironmentFailureException: (JE 5.0.73) JAVA_ERROR: Java Error occurred, recovery may not be possible.
at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1507)
at com.sleepycat.je.dbi.DbEnvPool.getEnvironment(DbEnvPool.java:134)
at com.sleepycat.je.Environment.makeEnvironmentImpl(Environment.java:246)
at com.sleepycat.je.Environment.<init>(Environment.java:227)
at com.sleepycat.je.Environment.<init>(Environment.java:170)
at com.thinkaurelius.titan.diskstorage.berkeleyje.BerkeleyJEStoreManager.initialize(BerkeleyJEStoreManager.java:104)
... 18 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at com.sleepycat.je.tree.IN.getKey(IN.java:775)
at com.sleepycat.je.tree.IN.findEntry(IN.java:2253)
at com.sleepycat.je.tree.Tree.searchSubTreeInternal(Tree.java:1486)
at com.sleepycat.je.tree.Tree.searchSubTree(Tree.java:1381)
at com.sleepycat.je.tree.Tree.search(Tree.java:1240)
at com.sleepycat.je.dbi.CursorImpl.searchAndPosition(CursorImpl.java:2101)
at com.sleepycat.je.Cursor.searchInternal(Cursor.java:2666)
at com.sleepycat.je.Cursor.searchAllowPhantoms(Cursor.java:2576)
at com.sleepycat.je.Cursor.searchNoDups(Cursor.java:2430)
at com.sleepycat.je.Cursor.search(Cursor.java:2397)
at com.sleepycat.je.Cursor.getSearchKeyRange(Cursor.java:1727)
at com.thinkaurelius.titan.diskstorage.berkeleyje.BerkeleyJEKeyValueStore.getSlice(BerkeleyJEKeyValueStore.java:123)
at com.thinkaurelius.titan.diskstorage.keycolumnvalue.keyvalue.OrderedKeyValueStoreAdapter.getSlice(OrderedKeyValueStoreAdapter.java:56)
at com.thinkaurelius.titan.diskstorage.util.MetricInstrumentedStore$1.call(MetricInstrumentedStore.java:92)
at com.thinkaurelius.titan.diskstorage.util.MetricInstrumentedStore$1.call(MetricInstrumentedStore.java:90)
at com.thinkaurelius.titan.diskstorage.util.MetricInstrumentedStore.runWithMetrics(MetricInstrumentedStore.java:214)
at com.thinkaurelius.titan.diskstorage.util.MetricInstrumentedStore.getSlice(MetricInstrumentedStore.java:89)
at com.thinkaurelius.titan.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:65)
at com.thinkaurelius.titan.diskstorage.BackendTransaction$1.call(BackendTransaction.java:256)
at com.thinkaurelius.titan.diskstorage.BackendTransaction$1.call(BackendTransaction.java:253)
at com.thinkaurelius.titan.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:56)
at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:42)
at com.thinkaurelius.titan.diskstorage.BackendTransaction.executeRead(BackendTransaction.java:428)
at com.thinkaurelius.titan.diskstorage.BackendTransaction.edgeStoreQuery(BackendTransaction.java:253)
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.edgeQuery(StandardTitanGraph.java:344)
at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$7$2.get(StandardTitanTx.java:1054)
at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$7$2.get(StandardTitanTx.java:1051)
at com.thinkaurelius.titan.graphdb.vertices.CacheVertex.loadRelations(CacheVertex.java:47)
at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$7.execute(StandardTitanTx.java:1051)
at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$7.execute(StandardTitanTx.java:1005)
at com.thinkaurelius.titan.graphdb.query.MetricsQueryExecutor$4.apply(MetricsQueryExecutor.java:66)
at com.thinkaurelius.titan.graphdb.query.MetricsQueryExecutor$4.apply(MetricsQueryExecutor.java:63)
This might then be good news indeed, it could all just be a problem of memory :question: I'm going to split UniRef files in smaller chunks and see how it goes
I got again an exception when using chunks of 50.000 entries instead of 500.000 ... :confused:
SEVERE: Could not commit transaction due to exception during persistence
Feb 08, 2015 11:54:59 PM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx.commit(StandardTitanTx.java:1309)
Feb 08, 2015 11:54:59 PM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsGraph.commit(TitanBlueprintsGraph.java:60)
Feb 08, 2015 11:54:59 PM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.angulillos.titan.TitanUntypedGraph.commit(TitanUntypedGraph.java:21)
Feb 08, 2015 11:54:59 PM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef.importUniProtUniRef(ImportUniProtUniRef.java:92)
Feb 08, 2015 11:54:59 PM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.titan.model.uniprot_uniref.programs.ImportUniProtUniRefTitan.execute(ImportUniProtUniRefTitan.java:40)
Feb 08, 2015 11:54:59 PM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.titan.model.uniprot_uniref.programs.ImportUniProtUniRefUsingFolderTitan.execute(ImportUniProtUniRefUsingFolderTitan.java:49)
Feb 08, 2015 11:54:59 PM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.ohnosequences.util.ExecuteFromFile.main(ExecuteFromFile.java:66)
Feb 08, 2015 11:54:59 PM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.titan.programs.ImportTitanDB.main(ImportTitanDB.java:8)
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.ArrayList.<init>(ArrayList.java:152)
at com.thinkaurelius.titan.graphdb.internal.OrderList.<init>(OrderList.java:23)
at com.thinkaurelius.titan.graphdb.query.graph.GraphCentricQueryBuilder.<init>(GraphCentricQueryBuilder.java:55)
at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx.query(StandardTitanTx.java:1263)
at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx.query(StandardTitanTx.java:82)
at com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsGraph.query(TitanBlueprintsGraph.java:225)
at com.bio4j.angulillos.titan.TitanTypedVertexIndex$Default.query(TitanTypedVertexIndex.java:82)
at com.bio4j.angulillos.TypedElementIndex$Unique.getElement(TypedElementIndex.java:44)
at com.bio4j.angulillos.TypedVertexIndex$Unique.getVertex(TypedVertexIndex.java:41)
at com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef.importUniProtUniRefFile(ImportUniProtUniRef.java:190)
at com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef.importUniProtUniRef(ImportUniProtUniRef.java:78)
at com.bio4j.titan.model.uniprot_uniref.programs.ImportUniProtUniRefTitan.execute(ImportUniProtUniRefTitan.java:40)
at com.bio4j.titan.model.uniprot_uniref.programs.ImportUniProtUniRefUsingFolderTitan.execute(ImportUniProtUniRefUsingFolderTitan.java:49)
at com.ohnosequences.util.ExecuteFromFile.main(ExecuteFromFile.java:66)
at com.bio4j.titan.programs.ImportTitanDB.main(ImportTitanDB.java:8)
I'm going to try finding spots where objects are created and whose memory can be freed somehow. Let's hope that can finally fix this!
Probably my suggestion is a stupid and you have already tried that. But once I fixed "GC overhead limit exceeded" exception with more frequent graph.commit()
calls.
:tada: Kim's back!!! :tada:
That's basically what we're doing
Yeah, I think we already tried most of what anyone would have first think about... however we keep getting similar exceptions... :confused: There's probably an underlying problem related to the way memory is managed at Berkeley DB level. Anyways, any kind of extra ideas to try out are more than welcome :wink:
I already made some progress on this and managed to import UniRef50 :tada: However, in order to keep going, I need to somehow mount both ephemeral drives (each one has 1TB) in just one, @eparejatobes @laughedelic do you know if that's possible? If so, what would be the best way to do it? Thanks!
Pablo
hey @pablopareja https://github.com/bio4j/bio4j-titan/issues/37#issuecomment-57600829
Thanks to @eparejatobes I already managed to create a disk holding both at the same time. Here's the link to the instructions: http://cloudacademy.com/blog/amazon-aws-raid-0-configuration-on-ebs-volumes/
:smiley:
OK so we are one step closer to having the whole thing! :tada: :tada:
I managed to import UniRef 90, which means that only UniRef 100 is left to have all UniRef (however that one was the one that was giving us trouble from the beginning so let's cross our fingers and hope for the best this time! )
By the way, Bio4j already weighs more than 1.1 TB ! :scream:
:clap: :clap:
Closer to the cake :cake: I promised :)
Good!! (cake and UniRef!!)
:disappointed: I got another exception when trying to import UniRef 100 again...
... 12 more
Caused by: com.thinkaurelius.titan.diskstorage.PermanentBackendException: Error during BerkeleyJE initialization:
at com.thinkaurelius.titan.diskstorage.berkeleyje.BerkeleyJEStoreManager.initialize(BerkeleyJEStoreManager.java:108)
at com.thinkaurelius.titan.diskstorage.berkeleyje.BerkeleyJEStoreManager.<init>(BerkeleyJEStoreManager.java:68)
... 17 more
Caused by: com.sleepycat.je.EnvironmentFailureException: (JE 5.0.73) JAVA_ERROR: Java Error occurred, recovery may not be possible.
at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1507)
at com.sleepycat.je.dbi.DbEnvPool.getEnvironment(DbEnvPool.java:134)
at com.sleepycat.je.Environment.makeEnvironmentImpl(Environment.java:246)
at com.sleepycat.je.Environment.<init>(Environment.java:227)
at com.sleepycat.je.Environment.<init>(Environment.java:170)
at com.thinkaurelius.titan.diskstorage.berkeleyje.BerkeleyJEStoreManager.initialize(BerkeleyJEStoreManager.java:104)
... 18 more
Caused by: java.lang.OutOfMemoryError: Java heap space
at com.sleepycat.je.Database.newDbcInstance(Database.java:744)
at com.sleepycat.je.Database.openCursor(Database.java:689)
at com.thinkaurelius.titan.diskstorage.berkeleyje.BerkeleyJEKeyValueStore.getSlice(BerkeleyJEKeyValueStore.java:122)
at com.thinkaurelius.titan.diskstorage.keycolumnvalue.keyvalue.OrderedKeyValueStoreAdapter.getSlice(OrderedKeyValueStoreAdapter.java:56)
at com.thinkaurelius.titan.diskstorage.log.kcvs.KCVSLog$MessagePuller$1.call(KCVSLog.java:772)
at com.thinkaurelius.titan.diskstorage.log.kcvs.KCVSLog$MessagePuller$1.call(KCVSLog.java:769)
at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:133)
at com.thinkaurelius.titan.diskstorage.util.BackendOperation$1.call(BackendOperation.java:147)
at com.thinkaurelius.titan.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:56)
at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:42)
at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:144)
at com.thinkaurelius.titan.diskstorage.log.kcvs.KCVSLog$MessagePuller.run(KCVSLog.java:706)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I'm running out of ideas here to get this running.... perhaps we could try with a bigger machine that has more RAM memory? @eparejatobes
mmm maybe. I don't know, it's not a really specific error message there. We can always go back to clusters as properties as a last resort.
Well the thing I can't fully understand is why it's always giving problems with UniRef 100 but not with UniRef 90 or 50 ??
In theory many more vertices are being created when importing UniRef 100 since the average size of clusters should be rather smaller. But why's that posing problems?
I already splitted the XML file in hundreds of chunks with a size of 50.000 entries so I don't get how it can be running out of memory! The total number of entries in the main file shouldn't influence how things now since we're using a lot of independent smaller files and launching different processes each time. Perhaps even though I'm explicitly closing the database handler and different executions are supposedly used for each file, in reality BerkeleyDB doesn't care about that and it is actually not shut down and it won't be till the Java process is done... Maybe (as an exception to get things finally running), I could try launching a set of processes each with just a smaller set of files and see if we get rid of this problem?
@pablopareja have you try another backed for Titan, embedded Cassandra for example?
No, I didn't yet. The thing is that after a long time we already have almost everything imported but UniRef 100 in Berkeley DB. But we could try it out anyways once we manage to fix this :+1:
I tried it for some small applications and it was slower than BerkleyDB, but probably it will behave better with big data sets.
I'm not sure though whether we could make the best out of Cassandra using it embedded and with just one cluster... Isn't it supposed to be used in distributed systems in order to actually enjoy its benefits compared to other technologies?
Probably the lack of consistency in Cassandra will leverage its performance, it's just a guess...
This is the never ending story... I tried a couple more things but I'm still getting similar exceptions with UniRef 100 .... Enough is enough so should we go for the plan B ? (in principle it would consist of implementing Uniref Clusters by means of properties in proteins)
It's neither elegant nor nice but I'm out of options trying to get working what we have... still we could keep the code and who knows, maybe in next releases it could work since the number of proteins in TrEMBL would be reduced to ~ half of what they have now...
@eparejatobes what branch should I use to carry out these changes in case no one speaks against them?
@pablopareja OK. In principle we could do this at the level of bio4j-titan, by implementing a custom graph which would retrieve everything through properties and indices. What do you think?
And how couls we do that from the code's perspective? Extending the classes we have? For instance, in the specific case of a Protein, will we be extending the class in order to add the properties for UniRef clusters or rather create a new Protein class? we could also create things at Titan core level without having those properties hard-coded defined...
:question:
One possible approach would be changing all the methods which work on those types at the graph level. The vertex/edge classes wouldn't need to know anything about it as in the end they just defer to the graph.
@pablopareja I'd like to play with this a bit and discuss it tomorrow.
yeah but what about the importers? so far we do all using the classes defined and so on... If you want to discuss it either today or tomorrow morning let me know :wink:
@eparejatobes ping!
Hey! I just wanted to give you an update on how are things going with this. I already managed to import UniRef 100 with the new workaround and the process is half way through UniRef 90 so, if nothing weird happens, all UniRef should be imported by ~ Wednesday. I'll keep you posted :wink:
Good luck!
I can't believe it but it looks like I managed to import all UniRef clusters! :tada: :tada: :tada: I'm uploading the tar file as I write this message :smiley: Still there are some tests to be run now but I think we're about to have the so long awaited cake pretty soon! :space_invader:
super congrats!!!!! :clap: :clap: :tada:
Congratulations clustered in 100 , 90 and 50 ,....!!!!!!!
:bell: :dart: We have to celebrate it :birthday:
:+1: :sweet_potato: :clap: :cactus:
since (at least in theory) this issue was fixed, I'm closing it :tada: I will open new ones in case it was necessary
I'm getting the following exception:
The exception is thrown when initializing the indices: https://github.com/bio4j/bio4j-titan/blob/896194fbff6e65b53ad0a638ab2b58293dc9bb25/src/main/java/com/bio4j/titan/model/uniref/TitanUniRefGraph.java#L120
I just can't find any difference from how things are initialized in other modules that have already been successfully imported... @eparejatobes could you have a look at this in case you see something I'm not seeing?