bio4j / bio4j-titan

Titan-specific bio4j implementation
https://github.com/bio4j/bio4j
6 stars 2 forks source link

Exception when importing UniRef in current version #61

Closed pablopareja closed 9 years ago

pablopareja commented 9 years ago

I'm getting the following exception:

SEVERE: null
java.lang.NullPointerException
        at com.bio4j.angulillos.titan.TitanTypedVertexIndex$Unique.name(TitanTypedVertexIndex.java:125)
        at com.bio4j.angulillos.titan.TitanTypedVertexIndex$DefaultUnique.<init>(TitanTypedVertexIndex.java:188)
        at com.bio4j.titan.model.uniref.TitanUniRefGraph.initIndices(TitanUniRefGraph.java:120)
        at com.bio4j.titan.model.uniref.TitanUniRefGraph.<init>(TitanUniRefGraph.java:85)
        at com.bio4j.titan.model.uniref.programs.ImportUniRefTitan.config(ImportUniRefTitan.java:43)
        at com.bio4j.titan.model.uniref.programs.ImportUniRefTitan.config(ImportUniRefTitan.java:37)
        at com.bio4j.model.uniref.programs.ImportUniRef.importUniRef(ImportUniRef.java:59)
        at com.bio4j.titan.model.uniref.programs.ImportUniRefTitan.execute(ImportUniRefTitan.java:52)
        at com.ohnosequences.util.ExecuteFromFile.main(ExecuteFromFile.java:66)
        at com.bio4j.titan.programs.ImportTitanDB.main(ImportTitanDB.java:8)

The exception is thrown when initializing the indices: https://github.com/bio4j/bio4j-titan/blob/896194fbff6e65b53ad0a638ab2b58293dc9bb25/src/main/java/com/bio4j/titan/model/uniref/TitanUniRefGraph.java#L120

I just can't find any difference from how things are initialized in other modules that have already been successfully imported... @eparejatobes could you have a look at this in case you see something I'm not seeing?

eparejatobes commented 9 years ago

mmm I don't know. Looks like either the property or the vertex type (or something derived from them) is null. Again it'd be nice to have a test for only type creation

pablopareja commented 9 years ago

Ok I will write the test first thing tomorrow

pablopareja commented 9 years ago

I already upload the new test here 60c154789591e0783c95f77493c86ccca38c1ccf Let's see if we can figure out what's going on!

pablopareja commented 9 years ago

I just added the following lines to the test immediately before the line where the error is thrown:

System.out.println(UniRef100Cluster() == null);
System.out.println(UniRef100Cluster().id == null);
System.out.println(mgmt == null);
System.out.println(this == null);

and they are all printing false so it's not a problem of any of the parameters...

@eparejatobes Do you have any idea why we may be getting a null in angulillos-titan here: https://github.com/bio4j/angulillos-titan/blob/master/src/main/java/com/bio4j/angulillos/titan/TitanTypedVertexIndex.java#L125 :question:

pablopareja commented 9 years ago

I added a couple of debug lines more having now:

System.out.println(UniRef100Cluster() == null);
System.out.println(UniRef100Cluster().id == null);
System.out.println(mgmt == null);
System.out.println(this == null);
System.out.println(UniRef100Cluster().id.name());
System.out.println(UniRef100Cluster().name());

And this is what I get:

false
false
false
false
com.bio4j.model.uniref.UniRefGraph.UniRef100ClusterType.id
com.bio4j.model.uniref.UniRefGraph.UniRef100ClusterType

So I really don't know where this null value comes from... :confused:

eparejatobes commented 9 years ago

What is null is UniRef100Cluster().id.elementType(). I'm looking at it.

pablopareja commented 9 years ago

Cool, can I help you out with something?

eparejatobes commented 9 years ago

@pablopareja see bio4j/bio4j#82

pablopareja commented 9 years ago

let's have a short meeting about this as soon as you find some time for it!

pablopareja commented 9 years ago

After all the work done in the previous weeks we got rid of that issue but now we have a new one. Here's the last exception I got:

SEVERE: Could not execute operation due to backend exception
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:44)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.diskstorage.BackendTransaction.executeRead(BackendTransaction.java:428)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.diskstorage.BackendTransaction.edgeStoreQuery(BackendTransaction.java:253)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.edgeQuery(StandardTitanGraph.java:344)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$7$2.get(StandardTitanTx.java:1054)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$7$2.get(StandardTitanTx.java:1051)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.vertices.CacheVertex.loadRelations(CacheVertex.java:47)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$7.execute(StandardTitanTx.java:1051)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$7.execute(StandardTitanTx.java:1005)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.query.MetricsQueryExecutor$4.apply(MetricsQueryExecutor.java:66)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.query.MetricsQueryExecutor$4.apply(MetricsQueryExecutor.java:63)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.query.MetricsQueryExecutor.runWithMetrics(MetricsQueryExecutor.java:81)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.query.MetricsQueryExecutor.execute(MetricsQueryExecutor.java:63)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.query.QueryProcessor$LimitAdjustingIterator.getNewIterator(QueryProcessor.java:195)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.query.QueryProcessor$LimitAdjustingIterator.getNewIterator(QueryProcessor.java:195)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.query.LimitAdjustingIterator.hasNext(LimitAdjustingIterator.java:54)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:40)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.query.ResultSetIterator.<init>(ResultSetIterator.java:30)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.query.QueryProcessor.iterator(QueryProcessor.java:56)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.vertices.AbstractVertex.getProperty(AbstractVertex.java:136)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.vertices.AbstractVertex.getProperty(AbstractVertex.java:152)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.angulillos.titan.TitanUntypedGraph.getPropertyV(TitanUntypedGraph.java:40)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.angulillos.titan.TitanUntypedGraph.getPropertyV(TitanUntypedGraph.java:14)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.angulillos.TypedGraph.getProperty(TypedGraph.java:79)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.angulillos.TypedVertex.get(TypedVertex.java:70)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.model.uniprot.vertices.Protein.accession(Protein.java:42)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef.importUniProtUniRefFile(ImportUniProtUniRef.java:260)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef.importUniProtUniRef(ImportUniProtUniRef.java:78)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.titan.model.uniprot_uniref.programs.ImportUniProtUniRefTitan.execute(ImportUniProtUniRefTitan.java:40)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.titan.model.uniprot_uniref.programs.ImportUniProtUniRefUsingFolderTitan.execute(ImportUniProtUniRefUsingFolderTitan.java:49)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.ohnosequences.util.ExecuteFromFile.main(ExecuteFromFile.java:66)
Jan 30, 2015 2:31:26 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.titan.programs.ImportTitanDB.main(ImportTitanDB.java:8)
Jan 30, 2015 2:31:56 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: Could not commit transaction due to exception during persistence
Jan 30, 2015 2:31:56 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx.commit(StandardTitanTx.java:1309)
Jan 30, 2015 2:31:56 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsGraph.commit(TitanBlueprintsGraph.java:60)
Jan 30, 2015 2:31:56 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.angulillos.titan.TitanUntypedGraph.commit(TitanUntypedGraph.java:21)
Jan 30, 2015 2:31:56 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.angulillos.titan.TitanUntypedGraph.commit(TitanUntypedGraph.java:21)
Jan 30, 2015 2:31:56 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef.importUniProtUniRef(ImportUniProtUniRef.java:92)
Jan 30, 2015 2:31:56 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.titan.model.uniprot_uniref.programs.ImportUniProtUniRefTitan.execute(ImportUniProtUniRefTitan.java:40)
Jan 30, 2015 2:32:36 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.titan.model.uniprot_uniref.programs.ImportUniProtUniRefUsingFolderTitan.execute(ImportUniProtUniRefUsingFolderTitan.java:49)
Jan 30, 2015 2:32:36 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.ohnosequences.util.ExecuteFromFile.main(ExecuteFromFile.java:66)
Jan 30, 2015 2:32:36 AM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.titan.programs.ImportTitanDB.main(ImportTitanDB.java:8)

It looks like something similar that we were getting before. By the way, the error was thrown after updating ~ 10.000.000 proteins, which in turn where imported using chunks of TrEMBL having 500.000 each. I'm going to try changing a bit the parameters for the importing process but I'm not sure that will have any effect at all... :confused:

@eparejatobes any ideas?

pablopareja commented 9 years ago

I could retrieve some extra information regarding this, even though it was not written to the log files, here it goes:

Caused by: com.sleepycat.je.EnvironmentFailureException: (JE 5.0.73) JAVA_ERROR: Java Error occurred, recovery may not be possible.
        at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1507)
        at com.sleepycat.je.dbi.DbEnvPool.getEnvironment(DbEnvPool.java:134)
        at com.sleepycat.je.Environment.makeEnvironmentImpl(Environment.java:246)
        at com.sleepycat.je.Environment.<init>(Environment.java:227)
        at com.sleepycat.je.Environment.<init>(Environment.java:170)
        at com.thinkaurelius.titan.diskstorage.berkeleyje.BerkeleyJEStoreManager.initialize(BerkeleyJEStoreManager.java:104)
        ... 18 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
        at com.sleepycat.je.tree.IN.getKey(IN.java:775)
        at com.sleepycat.je.tree.IN.findEntry(IN.java:2253)
        at com.sleepycat.je.tree.Tree.searchSubTreeInternal(Tree.java:1486)
        at com.sleepycat.je.tree.Tree.searchSubTree(Tree.java:1381)
        at com.sleepycat.je.tree.Tree.search(Tree.java:1240)
        at com.sleepycat.je.dbi.CursorImpl.searchAndPosition(CursorImpl.java:2101)
        at com.sleepycat.je.Cursor.searchInternal(Cursor.java:2666)
        at com.sleepycat.je.Cursor.searchAllowPhantoms(Cursor.java:2576)
        at com.sleepycat.je.Cursor.searchNoDups(Cursor.java:2430)
        at com.sleepycat.je.Cursor.search(Cursor.java:2397)
        at com.sleepycat.je.Cursor.getSearchKeyRange(Cursor.java:1727)
        at com.thinkaurelius.titan.diskstorage.berkeleyje.BerkeleyJEKeyValueStore.getSlice(BerkeleyJEKeyValueStore.java:123)
        at com.thinkaurelius.titan.diskstorage.keycolumnvalue.keyvalue.OrderedKeyValueStoreAdapter.getSlice(OrderedKeyValueStoreAdapter.java:56)
        at com.thinkaurelius.titan.diskstorage.util.MetricInstrumentedStore$1.call(MetricInstrumentedStore.java:92)
        at com.thinkaurelius.titan.diskstorage.util.MetricInstrumentedStore$1.call(MetricInstrumentedStore.java:90)
        at com.thinkaurelius.titan.diskstorage.util.MetricInstrumentedStore.runWithMetrics(MetricInstrumentedStore.java:214)
        at com.thinkaurelius.titan.diskstorage.util.MetricInstrumentedStore.getSlice(MetricInstrumentedStore.java:89)
        at com.thinkaurelius.titan.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:65)
        at com.thinkaurelius.titan.diskstorage.BackendTransaction$1.call(BackendTransaction.java:256)
        at com.thinkaurelius.titan.diskstorage.BackendTransaction$1.call(BackendTransaction.java:253)
        at com.thinkaurelius.titan.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:56)
        at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:42)
        at com.thinkaurelius.titan.diskstorage.BackendTransaction.executeRead(BackendTransaction.java:428)
        at com.thinkaurelius.titan.diskstorage.BackendTransaction.edgeStoreQuery(BackendTransaction.java:253)
        at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.edgeQuery(StandardTitanGraph.java:344)
        at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$7$2.get(StandardTitanTx.java:1054)
        at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$7$2.get(StandardTitanTx.java:1051)
        at com.thinkaurelius.titan.graphdb.vertices.CacheVertex.loadRelations(CacheVertex.java:47)
        at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$7.execute(StandardTitanTx.java:1051)
        at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$7.execute(StandardTitanTx.java:1005)
        at com.thinkaurelius.titan.graphdb.query.MetricsQueryExecutor$4.apply(MetricsQueryExecutor.java:66)
        at com.thinkaurelius.titan.graphdb.query.MetricsQueryExecutor$4.apply(MetricsQueryExecutor.java:63)

This might then be good news indeed, it could all just be a problem of memory :question: I'm going to split UniRef files in smaller chunks and see how it goes

pablopareja commented 9 years ago

I got again an exception when using chunks of 50.000 entries instead of 500.000 ... :confused:


SEVERE: Could not commit transaction due to exception during persistence
Feb 08, 2015 11:54:59 PM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx.commit(StandardTitanTx.java:1309)
Feb 08, 2015 11:54:59 PM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsGraph.commit(TitanBlueprintsGraph.java:60)
Feb 08, 2015 11:54:59 PM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.angulillos.titan.TitanUntypedGraph.commit(TitanUntypedGraph.java:21)
Feb 08, 2015 11:54:59 PM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef.importUniProtUniRef(ImportUniProtUniRef.java:92)
Feb 08, 2015 11:54:59 PM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.titan.model.uniprot_uniref.programs.ImportUniProtUniRefTitan.execute(ImportUniProtUniRefTitan.java:40)
Feb 08, 2015 11:54:59 PM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.titan.model.uniprot_uniref.programs.ImportUniProtUniRefUsingFolderTitan.execute(ImportUniProtUniRefUsingFolderTitan.java:49)
Feb 08, 2015 11:54:59 PM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.ohnosequences.util.ExecuteFromFile.main(ExecuteFromFile.java:66)
Feb 08, 2015 11:54:59 PM com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef importUniProtUniRef
SEVERE: com.bio4j.titan.programs.ImportTitanDB.main(ImportTitanDB.java:8)
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.ArrayList.<init>(ArrayList.java:152)
        at com.thinkaurelius.titan.graphdb.internal.OrderList.<init>(OrderList.java:23)
        at com.thinkaurelius.titan.graphdb.query.graph.GraphCentricQueryBuilder.<init>(GraphCentricQueryBuilder.java:55)
        at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx.query(StandardTitanTx.java:1263)
        at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx.query(StandardTitanTx.java:82)
        at com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsGraph.query(TitanBlueprintsGraph.java:225)
        at com.bio4j.angulillos.titan.TitanTypedVertexIndex$Default.query(TitanTypedVertexIndex.java:82)
        at com.bio4j.angulillos.TypedElementIndex$Unique.getElement(TypedElementIndex.java:44)
        at com.bio4j.angulillos.TypedVertexIndex$Unique.getVertex(TypedVertexIndex.java:41)
        at com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef.importUniProtUniRefFile(ImportUniProtUniRef.java:190)
        at com.bio4j.model.uniprot_uniref.programs.ImportUniProtUniRef.importUniProtUniRef(ImportUniProtUniRef.java:78)
        at com.bio4j.titan.model.uniprot_uniref.programs.ImportUniProtUniRefTitan.execute(ImportUniProtUniRefTitan.java:40)
        at com.bio4j.titan.model.uniprot_uniref.programs.ImportUniProtUniRefUsingFolderTitan.execute(ImportUniProtUniRefUsingFolderTitan.java:49)
        at com.ohnosequences.util.ExecuteFromFile.main(ExecuteFromFile.java:66)
        at com.bio4j.titan.programs.ImportTitanDB.main(ImportTitanDB.java:8)

I'm going to try finding spots where objects are created and whose memory can be freed somehow. Let's hope that can finally fix this!

evdokim commented 9 years ago

Probably my suggestion is a stupid and you have already tried that. But once I fixed "GC overhead limit exceeded" exception with more frequent graph.commit() calls.

laughedelic commented 9 years ago

:tada: Kim's back!!! :tada:

eparejatobes commented 9 years ago

That's basically what we're doing

pablopareja commented 9 years ago

Yeah, I think we already tried most of what anyone would have first think about... however we keep getting similar exceptions... :confused: There's probably an underlying problem related to the way memory is managed at Berkeley DB level. Anyways, any kind of extra ideas to try out are more than welcome :wink:

pablopareja commented 9 years ago

I already made some progress on this and managed to import UniRef50 :tada: However, in order to keep going, I need to somehow mount both ephemeral drives (each one has 1TB) in just one, @eparejatobes @laughedelic do you know if that's possible? If so, what would be the best way to do it? Thanks!

Pablo

eparejatobes commented 9 years ago

hey @pablopareja https://github.com/bio4j/bio4j-titan/issues/37#issuecomment-57600829

pablopareja commented 9 years ago

Thanks to @eparejatobes I already managed to create a disk holding both at the same time. Here's the link to the instructions: http://cloudacademy.com/blog/amazon-aws-raid-0-configuration-on-ebs-volumes/

:smiley:

pablopareja commented 9 years ago

OK so we are one step closer to having the whole thing! :tada: :tada:

I managed to import UniRef 90, which means that only UniRef 100 is left to have all UniRef (however that one was the one that was giving us trouble from the beginning so let's cross our fingers and hope for the best this time! )

By the way, Bio4j already weighs more than 1.1 TB ! :scream:

marina-manrique commented 9 years ago

:clap: :clap:

Closer to the cake :cake: I promised :)

rtobes commented 9 years ago

Good!! (cake and UniRef!!)

pablopareja commented 9 years ago

:disappointed: I got another exception when trying to import UniRef 100 again...

... 12 more
Caused by: com.thinkaurelius.titan.diskstorage.PermanentBackendException: Error during BerkeleyJE initialization:
        at com.thinkaurelius.titan.diskstorage.berkeleyje.BerkeleyJEStoreManager.initialize(BerkeleyJEStoreManager.java:108)
        at com.thinkaurelius.titan.diskstorage.berkeleyje.BerkeleyJEStoreManager.<init>(BerkeleyJEStoreManager.java:68)
        ... 17 more
Caused by: com.sleepycat.je.EnvironmentFailureException: (JE 5.0.73) JAVA_ERROR: Java Error occurred, recovery may not be possible.
        at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1507)
        at com.sleepycat.je.dbi.DbEnvPool.getEnvironment(DbEnvPool.java:134)
        at com.sleepycat.je.Environment.makeEnvironmentImpl(Environment.java:246)
        at com.sleepycat.je.Environment.<init>(Environment.java:227)
        at com.sleepycat.je.Environment.<init>(Environment.java:170)
        at com.thinkaurelius.titan.diskstorage.berkeleyje.BerkeleyJEStoreManager.initialize(BerkeleyJEStoreManager.java:104)
        ... 18 more
Caused by: java.lang.OutOfMemoryError: Java heap space
        at com.sleepycat.je.Database.newDbcInstance(Database.java:744)
        at com.sleepycat.je.Database.openCursor(Database.java:689)
        at com.thinkaurelius.titan.diskstorage.berkeleyje.BerkeleyJEKeyValueStore.getSlice(BerkeleyJEKeyValueStore.java:122)
        at com.thinkaurelius.titan.diskstorage.keycolumnvalue.keyvalue.OrderedKeyValueStoreAdapter.getSlice(OrderedKeyValueStoreAdapter.java:56)
        at com.thinkaurelius.titan.diskstorage.log.kcvs.KCVSLog$MessagePuller$1.call(KCVSLog.java:772)
        at com.thinkaurelius.titan.diskstorage.log.kcvs.KCVSLog$MessagePuller$1.call(KCVSLog.java:769)
        at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:133)
        at com.thinkaurelius.titan.diskstorage.util.BackendOperation$1.call(BackendOperation.java:147)
        at com.thinkaurelius.titan.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:56)
        at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:42)
        at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:144)
        at com.thinkaurelius.titan.diskstorage.log.kcvs.KCVSLog$MessagePuller.run(KCVSLog.java:706)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

I'm running out of ideas here to get this running.... perhaps we could try with a bigger machine that has more RAM memory? @eparejatobes

eparejatobes commented 9 years ago

mmm maybe. I don't know, it's not a really specific error message there. We can always go back to clusters as properties as a last resort.

pablopareja commented 9 years ago

Well the thing I can't fully understand is why it's always giving problems with UniRef 100 but not with UniRef 90 or 50 ??

In theory many more vertices are being created when importing UniRef 100 since the average size of clusters should be rather smaller. But why's that posing problems?

I already splitted the XML file in hundreds of chunks with a size of 50.000 entries so I don't get how it can be running out of memory! The total number of entries in the main file shouldn't influence how things now since we're using a lot of independent smaller files and launching different processes each time. Perhaps even though I'm explicitly closing the database handler and different executions are supposedly used for each file, in reality BerkeleyDB doesn't care about that and it is actually not shut down and it won't be till the Java process is done... Maybe (as an exception to get things finally running), I could try launching a set of processes each with just a smaller set of files and see if we get rid of this problem?

evdokim commented 9 years ago

@pablopareja have you try another backed for Titan, embedded Cassandra for example?

pablopareja commented 9 years ago

No, I didn't yet. The thing is that after a long time we already have almost everything imported but UniRef 100 in Berkeley DB. But we could try it out anyways once we manage to fix this :+1:

evdokim commented 9 years ago

I tried it for some small applications and it was slower than BerkleyDB, but probably it will behave better with big data sets.

pablopareja commented 9 years ago

I'm not sure though whether we could make the best out of Cassandra using it embedded and with just one cluster... Isn't it supposed to be used in distributed systems in order to actually enjoy its benefits compared to other technologies?

evdokim commented 9 years ago

Probably the lack of consistency in Cassandra will leverage its performance, it's just a guess...

pablopareja commented 9 years ago

This is the never ending story... I tried a couple more things but I'm still getting similar exceptions with UniRef 100 .... Enough is enough so should we go for the plan B ? (in principle it would consist of implementing Uniref Clusters by means of properties in proteins)

It's neither elegant nor nice but I'm out of options trying to get working what we have... still we could keep the code and who knows, maybe in next releases it could work since the number of proteins in TrEMBL would be reduced to ~ half of what they have now...

@eparejatobes what branch should I use to carry out these changes in case no one speaks against them?

eparejatobes commented 9 years ago

@pablopareja OK. In principle we could do this at the level of bio4j-titan, by implementing a custom graph which would retrieve everything through properties and indices. What do you think?

pablopareja commented 9 years ago

And how couls we do that from the code's perspective? Extending the classes we have? For instance, in the specific case of a Protein, will we be extending the class in order to add the properties for UniRef clusters or rather create a new Protein class? we could also create things at Titan core level without having those properties hard-coded defined...

:question:

eparejatobes commented 9 years ago

One possible approach would be changing all the methods which work on those types at the graph level. The vertex/edge classes wouldn't need to know anything about it as in the end they just defer to the graph.

eparejatobes commented 9 years ago

@pablopareja I'd like to play with this a bit and discuss it tomorrow.

pablopareja commented 9 years ago

yeah but what about the importers? so far we do all using the classes defined and so on... If you want to discuss it either today or tomorrow morning let me know :wink:

pablopareja commented 9 years ago

@eparejatobes ping!

pablopareja commented 9 years ago

Hey! I just wanted to give you an update on how are things going with this. I already managed to import UniRef 100 with the new workaround and the process is half way through UniRef 90 so, if nothing weird happens, all UniRef should be imported by ~ Wednesday. I'll keep you posted :wink:

rtobes commented 9 years ago

Good luck!

pablopareja commented 9 years ago

I can't believe it but it looks like I managed to import all UniRef clusters! :tada: :tada: :tada: I'm uploading the tar file as I write this message :smiley: Still there are some tests to be run now but I think we're about to have the so long awaited cake pretty soon! :space_invader:

marina-manrique commented 9 years ago

super congrats!!!!! :clap: :clap: :tada:

rtobes commented 9 years ago

Congratulations clustered in 100 , 90 and 50 ,....!!!!!!!

:bell: :dart: We have to celebrate it :birthday:

eparejatobes commented 9 years ago

:+1: :sweet_potato: :clap: :cactus:

pablopareja commented 9 years ago

since (at least in theory) this issue was fixed, I'm closing it :tada: I will open new ones in case it was necessary