bg7 / BG7

bacterial genome annotation system
bg7.ohnosequences.com
13 stars 7 forks source link

java.lang.ArrayIndexOutOfBoundsException: 3 #28

Closed mike-bioinfo closed 11 years ago

mike-bioinfo commented 11 years ago

Hi all BG7 team,

I am pretty happy with this pipeline but last days I have found this exception and I don't know what's wrong with my sequence, It's a 454 whole genome of 2.5mb. I've got this

Parsing predicted genes xml file... done! Extracting intergenic sequences... Everything's done! :) currentContigId = strtainXXX00001 cepaXXX00001, java.lang.ArrayIndexOutOfBoundsException: 3 at com.era7.bioinfo.annotation.gb.ExportGenBankFiles.getRnaStringForGenBank(ExportGenBankFiles.java:688) at com.era7.bioinfo.annotation.gb.ExportGenBankFiles.exportContigToGenBank(ExportGenBankFiles.java:441) at com.era7.bioinfo.annotation.gb.ExportGenBankFiles.main(ExportGenBankFiles.java:261) at com.era7.bioinfo.annotation.gb.ExportGenBankFiles.execute(ExportGenBankFiles.java:56) at com.era7.lib.bioinfo.bioinfoutil.ExecuteFromFile.main(ExecuteFromFile.java:66) at com.era7.bioinfo.annotation.BG7.main(BG7.java:32) parsing XML file with predicted genes...

The gff and gbk file are empty, could someone help me with this issue? Just to mention I have extracted a 30kb region from the same genome and Bg7 did a fantastic job, is this genome too big to be processed by BG7?

eparejatobes commented 11 years ago

Hi Mike,

you shouldn't have any size problems, it's actually a pretty small genome compared to what we routinely ran bg7 on. It looks like something with the .gbk output generation, we'll take a look at this

(cc'ing @pablopareja)

pablopareja commented 11 years ago

Hi Mike,

Thanks for opening the issue. Your problem is not related with the size of your genome but rather with the syntax of the FASTA headers from your RNA sequences.

I'm happy that you opened this issue because it made me realized that this is not properly documented on the wiki ( I just checked the documentation and I couldn't find it anywhere...).

So, basically what happens is that the program ExportGenBankFiles crashes when trying to export the RNAs provided through the BLAST XML file and doesn't find the syntax expected; which should look something like this in the RNA fasta headers:

ref|NC_011283|:75804-75898|Sec tRNA| [locus_tag=KPK_0076]

(The key here is that there should be at least 4 vertical bars '|' separating the different fields)

The exact line where your exception was thrown corresponds to the part where the RNA product is extracted from the fourth field (Sec tRNA in this example)

Summing up, if you could preprocess your RNA FASTA headers so that they comply with that syntax and run everything from the beginning (including the BLAST processes so that the resulting XML files get updated too) everything should work just right :wink:

Please let us know if you run into any kind of trouble and we'll gladly try to help you out.

Pablo

mike-bioinfo commented 11 years ago

Thank you guys, I will take a deeper look into my tRNA file. I will let you know ASAP!

Regards!

rtobes commented 11 years ago

For RNA we use the format of the .frn files that you can find in the Refseq download urls. For example this is a file for the RNA of one strain of Clostridium: ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Clostridium_SY8519_uid68705/NC_015737.frn

Raquel

mike-bioinfo commented 11 years ago

That's it! The frn file was messed up...I've generated it from a GBK file and extracting tRNA's with my scripts (this totally messed the headers syntaxis) but thanks to pablopareja to pointed out this and rtobes to provide me this link ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/ where I found my references to work with! I was able to run BG7 with no problems...now the manual curation begins! >.< ...

Thank you again!!!

pablopareja commented 11 years ago

Mike, great to hear that you got it working :wink:

Cheers,

Pablo

eparejatobes commented 11 years ago

nice! then I'm closing this; @pablopareja make sure that we do add this to the docs (tRNA fasta headers, and how get them from the NCBI website)

pablopareja commented 11 years ago

@eparejatobes Done! I included it as part of the main wiki page:

https://github.com/bg7/BG7/wiki