AlexGa / Phylostratigraphy

Pipeline for Phylostratigraphy
Apache License 2.0
13 stars 4 forks source link

java.lang.ArrayIndexOutOfBoundsException: 1 #3

Closed lilei1 closed 3 years ago

lilei1 commented 3 years ago

Hi there, Thanks for this scripts. I am adapting it to my own species (brachypodium) and got the error like this:

perl createPSmap.pl --organism /global/cscratch1/sd/llei2019/B_syl_pro/query_fasta/query_test.fasta --database /global/cscratch1/sd/llei2019/ncbi_NR_databases/rehead_nr_20201202.fa --prefix BS_BlastAll_PS_map --seqOffset 50 --evalue 1e-5 --threads 60 --blastPlus Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58) Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 at xmlParser.SeqIdentifier.convertHeadertoSeqIdentifier(SeqIdentifier.java:84) at xmlParser.ParseXMLtoPS.createPSmap(ParseXMLtoPS.java:125) at xmlParser.ParseXMLtoPS.(ParseXMLtoPS.java:38) at xmlParser.CreatePSmap.main(CreatePSmap.java:95) ... 5 more Removing BS_BlastAll_PS_map_query_test_1_50.xml after compressing to BS_BlastAll_PS_map_query_test_1_50.xml.tbz Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58) Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 at xmlParser.SeqIdentifier.convertHeadertoSeqIdentifier(SeqIdentifier.java:84) at xmlParser.ParseXMLtoPS.createPSmap(ParseXMLtoPS.java:125) at xmlParser.ParseXMLtoPS.(ParseXMLtoPS.java:38) at xmlParser.CreatePSmap.main(CreatePSmap.java:95) ... 5 more Removing BS_BlastAll_PS_map_query_test_51_100.xml after compressing to BS_BlastAll_PS_map_query_test_51_100.xml.tbz

It seems like something wrong with the "ParseXMLtoPS.jar". But I could not figure it out. Any ideas about it?

AlexGa commented 3 years ago

Hi lilei, many thanks for contacting me. Have you checked the xml files for completeness? It seems that the java program, which parses the xml output and creates the PS map, cannot detect the correct sequence id of the BLAST hit. This information is stored in the xml file under the tag . It should look like this example from a Danio rerio BLAST search:

XP_003198386.1 | [Danio rerio] | [Eukaryota; Opisthokonta; Metazoa; Eumetazoa; Bilateria; Deuterostomia; Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; Actinopterygii; Actinopteri; Neopterygii; Teleostei; Elopocephala; Clupeocephala; Otocephala; Ostariophysi; Otophysi; Cypriniphysi; Cypriniformes; Cyprinoidea; Cyprinidae; Danio]
lilei1 commented 3 years ago

Hi Alex, Thank you so much for your reply. My tag look like this:

<Hit_def>PUZ50567.1 | [Panicum hallii var. hallii] | [cellular organisms; Eukaryota; Viridiplantae; Streptophyta; Streptophytina; Embryophyta; Tracheophyta; Euphylloph
yta; Spermatophyta; Magnoliopsida; Mesangiospermae; Liliopsida; Petrosaviidae; commelinids; Poales; Poaceae; PACMAD clade; Panicoideae; Panicodae; Paniceae; Panicinae; P
anicum; Panicum sect. Panicum; Panicum hallii;]</Hit_def>

Is that because I included "cellular organisms" and the last ";"?

Thanks. Li

lilei1 commented 3 years ago

Hello again, I've revised my code to adapt the header strictly as you required (excluded "cellular organisms" and the last ";"), but still got the same error:

perl createPSmap.pl --organism /global/cscratch1/sd/llei2019/B_syl_pro/query_fasta/query_test.fasta --database /global/cscratch1/sd/llei2019/ncbi_NR_databases/rehead_nr_20201202.fa  --prefix BS_BlastAll_PS_map --seqOffset 50  --evalue 1e-5 --threads 60 --blastPlus
Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
        at xmlParser.SeqIdentifier.convertHeadertoSeqIdentifier(SeqIdentifier.java:84)
        at xmlParser.ParseXMLtoPS.createPSmap(ParseXMLtoPS.java:125)
        at xmlParser.ParseXMLtoPS.<init>(ParseXMLtoPS.java:38)
        at xmlParser.CreatePSmap.main(CreatePSmap.java:95)
        ... 5 more
Removing BS_BlastAll_PS_map_query_test_1_50.xml after compressing to BS_BlastAll_PS_map_query_test_1_50.xml.tbz

By the way, my tag look like this:

<Hit_def>XP_037467667.1 | [Triticum dicoccoides] | [Eukaryota; Viridiplantae; Streptophyta; Streptophytina; Embryophyta; Tracheophyta; Euphyllophyta; Spermatophyta; Magnoliopsida; Mesangiospermae; Liliopsida; Petrosaviidae; commelinids; Poales; Poaceae; BOP clade; Pooideae; Triticodae; Triticeae; Triticinae; Triticum]</Hit_def>

Do you have any suggestions? Thanks. Best, Li

AlexGa commented 3 years ago

Hi Li, based on your provided tag, I cannot reproduce the error. I've added some lines to the code to track the line in your xml files which may be causing the exceptions. If you could try running the updated ParseXMLtoPS.jar on one xml file and post the new error message, we may find the problem.

E.g. Could you unpack the file BS_BlastAll_PS_map_query_test_1_50.xml.tbz with:

tar xfvj BS_BlastAll_PS_map_query_test_1_50.xml.tbz

And run ParseXMLtoPS.jar with:

java -jar ParseXMLtoPS.jar -e 1e-5 -i BS_BlastAll_PS_map_query_test_1_50.xml -p BS_BlastAll_PS_map

lilei1 commented 3 years ago

Hi Alex, Thank you. Here is the error message:

java -jar ParseXMLtoPS.jar -e 1e-5 -i BS_BlastAll_PS_map_query_test_1_50.xml -p BS_BlastAll_PS_map
Start parsing BS_BlastAll_PS_map_query_test_1_50.xml for phylostratigraphy.
Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
        at xmlParser.SeqIdentifier.convertHeadertoSeqIdentifier(SeqIdentifier.java:84)
        at xmlParser.ParseXMLtoPS.createPSmap(ParseXMLtoPS.java:125)
        at xmlParser.ParseXMLtoPS.<init>(ParseXMLtoPS.java:38)
        at xmlParser.CreatePSmap.main(CreatePSmap.java:95)
        ... 5 more
(/global/homes/l/llei2019/bscratch/software/my_work_en) 

Here is the link to the xml file. You can download it in case you want to check. Thank you so much! Cheers, Li

AlexGa commented 3 years ago

Thanks for the link. It seems that some sequence identifier from your BLAST database do not follow the definition of >GeneID | [organism_name] | [taxonomy]. E.g. in your provided xml file at line 12266

  <Hit_def>WP_194824234.1 unnamed protein product</Hit_def>

Following your error message, it seems that you did not pull the new jar-Version from the git? (git pull) Sorry for the possible misunderstanding, I've just published the new release (v0.0.5).

Now, the error message should look like this:

Start parsing BS_BlastAll_PS_map_query_test_1_50.xml for phylostratigraphy.
--> Error parsing tag <Hit_def>
--> Check [line 12266]: "WP_194824234.1 unnamed protein product"
java.lang.IllegalArgumentException: Incorrect BLAST identifier
    at xmlParser.SeqIdentifier.convertHeadertoSeqIdentifier(SeqIdentifier.java:85)
    at xmlParser.ParseXMLtoPS.createPSmap(ParseXMLtoPS.java:137)
    at xmlParser.ParseXMLtoPS.<init>(ParseXMLtoPS.java:38)
    at xmlParser.CreatePSmap.main(CreatePSmap.java:95)

When you remove or update the sequences in your database with the incomplete header, the program should parse your files correctly.

Best

Alex

lilei1 commented 3 years ago

Thank you, Alex! I've just git pull your new jar file and test the same XML file, but the error is different from the one you showed to me:

java -jar ParseXMLtoPS.jar -e 1e-5 -i BS_BlastAll_PS_map_query_test_1_50.xml -p BS_BlastAll_PS_map
Exception in thread "main" java.lang.UnsupportedClassVersionError: xmlParser/CreatePSmap : Unsupported major.minor version 52.0
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
        at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)
(/global/homes/l/llei2019/bscratch/software/my_work_en) 

It seems like you compiled the jar with a different version of JDK, which could not match my current one. Could you please let me know the version of the JDK? or Any other suggestions? By the way, I will work on my script to fix the format issue and will test gain. Thanks again. Best, Li

AlexGa commented 3 years ago

Hi Li, I used the Java SE 14 (March 2020), but in order to be downwards compatible I recompiled the jar with J2SE 1.4. I hope the updated jar works for you.

Best Alex

lilei1 commented 3 years ago

Hi Alex, Thank you so much for helping me to make this program work. I tested a small sample file (around 100 query sequences) and it works! Could you please tell me how each PS corresponds to the lineage? I know PS1:Eukaryota and how about others? Thanks. Best, Li

AlexGa commented 3 years ago

The phylostrata correspond to the nodes in the taxonomy of your query species (B. sylvaticum).

PS Taxonomic node
1 Cellular Organisms
2 Eukaryota
3 Viridiplantae
... ...
21 Brachypodium sylvaticum
You can also find this assignment in the output file with the suffix hits_table.csv. It provides you with all information about the BLAST hits from your xml file that were below the E-value threshold. PS PS.name Hit.Organism query.id hit.id Evalue Identity Score BitScore

The first and second column of that table show the assignment of a certain PS to its corresponding taxonomic node. However, it may happen that some PS do not contain any BLAST hits and thus do not appear in this table.

--

Since we were able to solve the problem with the java.lang.ArrayIndexOutOfBoundsException, I would like to close the current issue. If you have other questions (or issues) with the phylostratigraphy pipeline, please let me know or open a new issue. I am happy to help you.

Best Alex