bg7 / BG7

bacterial genome annotation system
bg7.ohnosequences.com
13 stars 7 forks source link

ArrayIndexOutOfBoundsException in PredictGenes.java #43

Open ehsueh opened 10 years ago

ehsueh commented 10 years ago

Hi BG7 Developers,

Thank you so much for making BG7 open source! I really like the idea behind BG7's annotation method!

Currently, I am running BG7 on Listeria monocytogenes EDG e (sequence fasta downloaded from NCBI). I keep encountering ArrayIndexOutOfBoundsException when I run BG7.jar.

To pinpoint the problem, I ran the following command:

java -d64 -Xmx6G -Xms1G -jar PredictGenes.jar blast.xml sequence_header_fixed.fasta output.xml 400 false 30

Which gave me the following error:

java.lang.ArrayIndexOutOfBoundsException: 1
    at com.era7.bioinfo.annotation.PredictGenes.main(PredictGenes.java:221)

I thought the problem might be coming from my data's format in blast.xml. So I tried running PredictGenes.jar with the example input files provided in /BG7-master/bg7_example_input_files. However, I still get the same error. Except, running the this bigger set of data made me realize that the loop only breaks sometimes. It is fine for all the iterations until "A8YJR5" in the provided example data (if that helps....). From the xml I don't see how this iteration is different from the other ones before it though. I wanted to print and see what some variables like hsp.size() are at this point, however, when I make any changes to the java source code and recompile the jar, I get the following complaint:

Failed to load Main-Class manifest attribute from PredictGenes.jar

Which I also have not found a way to fix. :( Do you know what I am doing wrong?

Thanks a lot in advance for your help! :D

Emma

pablopareja commented 10 years ago

Hi Emma,

Thanks for submitting the issue. Could you first confirm where did you get the file BG7.jar from?

ehsueh commented 10 years ago

Hi Pablo, I downloaded the master version from GitHub.

pablopareja commented 10 years ago

OK, could you please download and use the version I just uploaded to the branch new/version here: https://github.com/bg7/BG7/tree/new/version/distribution We didn't merge it to master yet but this branch contains the most recent version where we use Sbt and a few fixes for some issues have already been applied. Let me know how it goes :wink:

ehsueh commented 10 years ago

I tried to run the distribution version of bg7 (my java version 1.7.0_65):

java -d64 -Xmx6G -Xms1G -jar bg7-assembly-0.1.0-SNAPSHOT.jar bg7-assembly-0.1.0-SNAPSHOT.jar

and I got

Exception in thread "main" java.lang.UnsupportedClassVersionError: com/ohnosequences/bg7/BG7 : Unsupported major.minor version 52.0
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)

What version of javac are you using?

Thanks :smile:

ehsueh commented 10 years ago

I also tried java version "1.6.0_24". Same error. Should I upgrade to 1.8?

pablopareja commented 10 years ago

Hi Emma,

Yeah we're using by default Java 8 in all of our projects now :wink:

ehsueh commented 10 years ago

Kk, thanks. Finally successfully installed java 8. Took me so long :cry: Couldn't get jdk-8u11-linux-x64 to work. Ended up using jdk-8u5-linux-x64.

Hmm.... bg7-assembly-0.1.0-SNAPSHOT.jar looks for the file execution.xml, right? I saved one in the same directory as the jar, ran the jar and received the following error:

com.ohnosequences.util.ExecuteFromFile main
SEVERE: null
java.lang.ClassNotFoundException: com.era7.bioinfo.annotation.PredictGenes
    at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:259)
    at com.ohnosequences.util.ExecuteFromFile.main(ExecuteFromFile.java:50)
    at com.ohnosequences.bg7.BG7.main(BG7.java:33)

Do I need to compile the rest of the java files in the src diretory before running bg7-assembly-0.1.0-SNAPSHOT.jar? If so, which version do I compile? The files in /src/com, /main/..../era7 or /main/..../ohnosequences ?

Thanks.

pablopareja commented 10 years ago

I forgot to mention that packages have changed a bit in the last version so the executions.xml file that should be used with this jar file is that from the branch new/version https://github.com/bg7/BG7/blob/new/version/executionsTemplate.xml

ehsueh commented 10 years ago

Hi Pablo, Yeah. I used the new one. The FixFastaHeaders task ran without a problem. However, same error with PredictGenes.... :confused:

pablopareja commented 10 years ago

Hi Emma, Could you please provide the BLAST XML file that you're using?

ehsueh commented 10 years ago

I tried two different sets of data. Here is the dropbox link: https://www.dropbox.com/sh/vjyelsndz6x07zy/AAASx4kCfwszohKysFrWr1Fca

pablopareja commented 10 years ago

I've been carrying out some tests and the problem is related to the Organism value expected as part of the tag Iteration_query-def in the BLAST XML file:

https://github.com/bg7/BG7/blob/new/version/src/main/java/com/ohnosequences/bg7/PredictGenes.java#L210

@rtobes @marina-manrique could you please confirm this is a mandatory behavior? That''s to say, should the program carry on if and only if this value is found?

rtobes commented 10 years ago

Uniprot is changing many things during the last months. They are changing the way of organizing sequences that now come form UniParc (all of them). I don't know if the format of the headers in the fasta that you retrieve from Uniprot could be now different (in some cases) to the format that we expect in BG7.

@pablopareja, Could you put here some specific header that does not include the OS= content?

pablopareja commented 10 years ago

Here is a random header from one of the BLAST files that @ehsueh uploaded to dropbox:

<Iteration_query-def>gi|386048931|ref|YP_005966922.1| chromosomal replication initiator protein dnaA [Listeria monocytogenes FSL R2-561]</Iteration_query-def>
rtobes commented 10 years ago

This header is not a header from a Uniprot protein. It is a header from a NCBI protein.

BG7 works with fasta headers from proteins retrieved from Uniprot, not from NCBI.

pablopareja commented 10 years ago

@ehsueh could you confirm where you retrieved that protein from?

ehsueh commented 10 years ago

Yes, the reference protein and RNA fasta files where both retrieved from NCBI. I would need to reformat the headers first?

pablopareja commented 10 years ago

Yeah, I think so. According to what @rtobes says, we support FASTA headers for proteins retrieved from Uniprot

ehsueh commented 10 years ago

Is that what's crashing PredictGenes? Why does it only crash on some iterations though?

pablopareja commented 10 years ago

In the tests I ran that was the reason and it crashed in the first iteration when using the file Lm_EDG_e_proteins_tBLASTn.xml

ehsueh commented 10 years ago

Oh. Yes. You are right. That one crashed on the first iteration for me too. It was the second data set (E coli) that ran for a few iterations before crashing. That one I got from master/bg7_example_input_files.zip, so the fasta is probably in the correct UniProt format.