Rothamsted / knetbuilder

KnetBuilder data integration platform for building knowledge graphs. Previously known as ondex.
https://knetminer.com
MIT License
12 stars 11 forks source link

Accesion based mapper fails with OutOfMemoryError #30

Closed josephhearnshaw closed 3 years ago

josephhearnshaw commented 4 years ago

Out of memory error occurs.

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.lang.Class.getInterfaces(Class.java:856)
    at java.lang.Class.getGenericInterfaces(Class.java:913)
    at java.util.HashMap.comparableClassFor(HashMap.java:351)
    at java.util.HashMap$TreeNode.find(HashMap.java:1874)
    at java.util.HashMap$TreeNode.getTreeNode(HashMap.java:1889)
    at java.util.HashMap.getNode(HashMap.java:576)
    at java.util.HashMap.get(HashMap.java:557)
    at net.sourceforge.ondex.core.memory.MemoryONDEXGraph.storeRelation(MemoryONDEXGraph.java:400)
    at net.sourceforge.ondex.core.base.AbstractONDEXGraph.createRelation(AbstractONDEXGraph.java:191)
    at net.sourceforge.ondex.core.EntityFactory.createRelation(EntityFactory.java:242)
    at net.sourceforge.ondex.mapping.lowmemoryaccessionbased.Mapping.createRelation(Mapping.java:503)
    at net.sourceforge.ondex.mapping.lowmemoryaccessionbased.Mapping.createRelationsOnResults(Mapping.java:395)
    at net.sourceforge.ondex.mapping.lowmemoryaccessionbased.Mapping.start(Mapping.java:250)
    at net.sourceforge.ondex.workflow.engine.Engine.runMapping(Engine.java:396)
    at net.sourceforge.ondex.workflow.engine.PluginProcessor$4.run(PluginProcessor.java:128)
    at net.sourceforge.ondex.workflow.engine.PluginProcessor$4.run(PluginProcessor.java:126)
    at net.sourceforge.ondex.workflow.engine.PluginProcessor.execute(PluginProcessor.java:83)
    at net.sourceforge.ondex.workflow.engine.BasicJobImpl.run(BasicJobImpl.java:110)
    at net.sourceforge.ondex.WorkflowMain.main(WorkflowMain.java:216)
    at net.sourceforge.ondex.OndexMiniMain.main(OndexMiniMain.java:7)

Test workflow and data available under knetminer/test/mapping_bug/git_issue_30/

The input OMA data might have malformed accession datasources or ids.

marco-brandizi commented 4 years ago

@josephhearnshaw, some paths have changed:

FASTA and GFF3: 2#2 The argument is invalid -- The file /home/data/knetminer/species/fungi/fusarium_2020/fungi/organisms/fusarium_culmorum/ensembl/fusarium_culmorum_pep_all.fa does not exist and is required to do so for Fasta File. Absolute path: /home/data/knetminer/species/fungi/fusarium_2020/fungi/organisms/fusarium_culmorum/ensembl/fusarium_culmorum_pep_all.fa Absolute path to a FASTA input file with protein secuences
FASTA and GFF3: 2#2 The argument is invalid -- The file /home/data/knetminer/species/fungi/fusarium_2020/fungi/organisms/fusarium_culmorum/ensembl/fusarium_culmorum_gene_protein_mapping.txt does not exist and is required to do so for Mapping File. Absolute path: /home/data/knetminer/species/fungi/fusarium_2020/fungi/organisms/fusarium_culmorum/ensembl/fusarium_culmorum_gene_protein_mapping.txt Absolute path to a mapping input file which provides mapping relationsship between the GFF and the FASTA file. It should contain two columns: 2) gene id and 4) protein id
FASTA and GFF3: 2#2 The argument is invalid -- The file /home/data/knetminer/species/fungi/fusarium_2020/fungi/organisms/fusarium_culmorum/ensembl/fusarium_culmorum_genes.gff3 does not exist and is required to do so for GFF3 File. Absolute path: /home/data/knetminer/species/fungi/fusarium_2020/fungi/organisms/fusarium_culmorum/ensembl/fusarium_culmorum_genes.gff3 Absolute path to a GFF3 input file with 9 columns. It uses 1)chromosome id, 4)start, 5)end and 9)gene id and gene description i.e. "ID=PGSC0003DMG400030251;Name=""Conserved gene of unknown function""" 

I cannot find those files, please fix their paths (new launching script is basePath).

marco-brandizi commented 4 years ago

With the new launching script and without the FASTA parsing step, it completed without errors, the only strange thing is it took almost an hour to produce a small 1.4Mb OXL. I'm attaching the output here, for future reference.

git_issue_30-out.zip