Closed lilei1 closed 3 years ago
Hi lilei,
many thanks for contacting me. Have you checked the xml files for completeness? It seems that the java program, which parses the xml output and creates the PS map, cannot detect the correct sequence id of the BLAST hit. This information is stored in the xml file under the tag
XP_003198386.1 | [Danio rerio] | [Eukaryota; Opisthokonta; Metazoa; Eumetazoa; Bilateria; Deuterostomia; Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; Actinopterygii; Actinopteri; Neopterygii; Teleostei; Elopocephala; Clupeocephala; Otocephala; Ostariophysi; Otophysi; Cypriniphysi; Cypriniformes; Cyprinoidea; Cyprinidae; Danio]
Hi Alex,
Thank you so much for your reply.
My tag
<Hit_def>PUZ50567.1 | [Panicum hallii var. hallii] | [cellular organisms; Eukaryota; Viridiplantae; Streptophyta; Streptophytina; Embryophyta; Tracheophyta; Euphylloph
yta; Spermatophyta; Magnoliopsida; Mesangiospermae; Liliopsida; Petrosaviidae; commelinids; Poales; Poaceae; PACMAD clade; Panicoideae; Panicodae; Paniceae; Panicinae; P
anicum; Panicum sect. Panicum; Panicum hallii;]</Hit_def>
Is that because I included "cellular organisms" and the last ";"?
Thanks. Li
Hello again, I've revised my code to adapt the header strictly as you required (excluded "cellular organisms" and the last ";"), but still got the same error:
perl createPSmap.pl --organism /global/cscratch1/sd/llei2019/B_syl_pro/query_fasta/query_test.fasta --database /global/cscratch1/sd/llei2019/ncbi_NR_databases/rehead_nr_20201202.fa --prefix BS_BlastAll_PS_map --seqOffset 50 --evalue 1e-5 --threads 60 --blastPlus
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at xmlParser.SeqIdentifier.convertHeadertoSeqIdentifier(SeqIdentifier.java:84)
at xmlParser.ParseXMLtoPS.createPSmap(ParseXMLtoPS.java:125)
at xmlParser.ParseXMLtoPS.<init>(ParseXMLtoPS.java:38)
at xmlParser.CreatePSmap.main(CreatePSmap.java:95)
... 5 more
Removing BS_BlastAll_PS_map_query_test_1_50.xml after compressing to BS_BlastAll_PS_map_query_test_1_50.xml.tbz
By the way, my tag
<Hit_def>XP_037467667.1 | [Triticum dicoccoides] | [Eukaryota; Viridiplantae; Streptophyta; Streptophytina; Embryophyta; Tracheophyta; Euphyllophyta; Spermatophyta; Magnoliopsida; Mesangiospermae; Liliopsida; Petrosaviidae; commelinids; Poales; Poaceae; BOP clade; Pooideae; Triticodae; Triticeae; Triticinae; Triticum]</Hit_def>
Do you have any suggestions? Thanks. Best, Li
Hi Li, based on your provided tag, I cannot reproduce the error. I've added some lines to the code to track the line in your xml files which may be causing the exceptions. If you could try running the updated ParseXMLtoPS.jar on one xml file and post the new error message, we may find the problem.
E.g. Could you unpack the file BS_BlastAll_PS_map_query_test_1_50.xml.tbz with:
tar xfvj BS_BlastAll_PS_map_query_test_1_50.xml.tbz
And run ParseXMLtoPS.jar with:
java -jar ParseXMLtoPS.jar -e 1e-5 -i BS_BlastAll_PS_map_query_test_1_50.xml -p BS_BlastAll_PS_map
Hi Alex, Thank you. Here is the error message:
java -jar ParseXMLtoPS.jar -e 1e-5 -i BS_BlastAll_PS_map_query_test_1_50.xml -p BS_BlastAll_PS_map
Start parsing BS_BlastAll_PS_map_query_test_1_50.xml for phylostratigraphy.
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at xmlParser.SeqIdentifier.convertHeadertoSeqIdentifier(SeqIdentifier.java:84)
at xmlParser.ParseXMLtoPS.createPSmap(ParseXMLtoPS.java:125)
at xmlParser.ParseXMLtoPS.<init>(ParseXMLtoPS.java:38)
at xmlParser.CreatePSmap.main(CreatePSmap.java:95)
... 5 more
(/global/homes/l/llei2019/bscratch/software/my_work_en)
Here is the link to the xml file. You can download it in case you want to check. Thank you so much! Cheers, Li
Thanks for the link. It seems that some sequence identifier from your BLAST database do not follow the definition of
>GeneID | [organism_name] | [taxonomy]
.
E.g. in your provided xml file at line 12266
<Hit_def>WP_194824234.1 unnamed protein product</Hit_def>
Following your error message, it seems that you did not pull the new jar-Version from the git? (git pull
)
Sorry for the possible misunderstanding, I've just published the new release (v0.0.5).
Now, the error message should look like this:
Start parsing BS_BlastAll_PS_map_query_test_1_50.xml for phylostratigraphy.
--> Error parsing tag <Hit_def>
--> Check [line 12266]: "WP_194824234.1 unnamed protein product"
java.lang.IllegalArgumentException: Incorrect BLAST identifier
at xmlParser.SeqIdentifier.convertHeadertoSeqIdentifier(SeqIdentifier.java:85)
at xmlParser.ParseXMLtoPS.createPSmap(ParseXMLtoPS.java:137)
at xmlParser.ParseXMLtoPS.<init>(ParseXMLtoPS.java:38)
at xmlParser.CreatePSmap.main(CreatePSmap.java:95)
When you remove or update the sequences in your database with the incomplete header, the program should parse your files correctly.
Best
Alex
Thank you, Alex! I've just git pull your new jar file and test the same XML file, but the error is different from the one you showed to me:
java -jar ParseXMLtoPS.jar -e 1e-5 -i BS_BlastAll_PS_map_query_test_1_50.xml -p BS_BlastAll_PS_map
Exception in thread "main" java.lang.UnsupportedClassVersionError: xmlParser/CreatePSmap : Unsupported major.minor version 52.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)
(/global/homes/l/llei2019/bscratch/software/my_work_en)
It seems like you compiled the jar with a different version of JDK, which could not match my current one. Could you please let me know the version of the JDK? or Any other suggestions? By the way, I will work on my script to fix the format issue and will test gain. Thanks again. Best, Li
Hi Li, I used the Java SE 14 (March 2020), but in order to be downwards compatible I recompiled the jar with J2SE 1.4. I hope the updated jar works for you.
Best Alex
Hi Alex, Thank you so much for helping me to make this program work. I tested a small sample file (around 100 query sequences) and it works! Could you please tell me how each PS corresponds to the lineage? I know PS1:Eukaryota and how about others? Thanks. Best, Li
The phylostrata correspond to the nodes in the taxonomy of your query species (B. sylvaticum).
PS | Taxonomic node |
---|---|
1 | Cellular Organisms |
2 | Eukaryota |
3 | Viridiplantae |
... | ... |
21 | Brachypodium sylvaticum |
You can also find this assignment in the output file with the suffix hits_table.csv . It provides you with all information about the BLAST hits from your xml file that were below the E-value threshold. |
PS | PS.name | Hit.Organism | query.id | hit.id | Evalue | Identity | Score | BitScore |
---|
The first and second column of that table show the assignment of a certain PS to its corresponding taxonomic node. However, it may happen that some PS do not contain any BLAST hits and thus do not appear in this table.
--
Since we were able to solve the problem with the java.lang.ArrayIndexOutOfBoundsException, I would like to close the current issue. If you have other questions (or issues) with the phylostratigraphy pipeline, please let me know or open a new issue. I am happy to help you.
Best Alex
Hi there, Thanks for this scripts. I am adapting it to my own species (brachypodium) and got the error like this:
perl createPSmap.pl --organism /global/cscratch1/sd/llei2019/B_syl_pro/query_fasta/query_test.fasta --database /global/cscratch1/sd/llei2019/ncbi_NR_databases/rehead_nr_20201202.fa --prefix BS_BlastAll_PS_map --seqOffset 50 --evalue 1e-5 --threads 60 --blastPlus Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58) Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 at xmlParser.SeqIdentifier.convertHeadertoSeqIdentifier(SeqIdentifier.java:84) at xmlParser.ParseXMLtoPS.createPSmap(ParseXMLtoPS.java:125) at xmlParser.ParseXMLtoPS.(ParseXMLtoPS.java:38)
at xmlParser.CreatePSmap.main(CreatePSmap.java:95)
... 5 more
Removing BS_BlastAll_PS_map_query_test_1_50.xml after compressing to BS_BlastAll_PS_map_query_test_1_50.xml.tbz
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at xmlParser.SeqIdentifier.convertHeadertoSeqIdentifier(SeqIdentifier.java:84)
at xmlParser.ParseXMLtoPS.createPSmap(ParseXMLtoPS.java:125)
at xmlParser.ParseXMLtoPS.(ParseXMLtoPS.java:38)
at xmlParser.CreatePSmap.main(CreatePSmap.java:95)
... 5 more
Removing BS_BlastAll_PS_map_query_test_51_100.xml after compressing to BS_BlastAll_PS_map_query_test_51_100.xml.tbz
It seems like something wrong with the "ParseXMLtoPS.jar". But I could not figure it out. Any ideas about it?