bg7 / BG7

bacterial genome annotation system
bg7.ohnosequences.com
13 stars 7 forks source link

Still fail to get annotation #30

Closed mscook closed 11 years ago

mscook commented 11 years ago

After publication I thought I would give this another go.

I apologise but my output and error go to separate files:

Error -

java.lang.ArrayIndexOutOfBoundsException: 1 at com.era7.lib.bioinfo.bioinfoutil.uniprot.UniprotProteinRetreiver.getUniprotDataFor(UniprotProteinRetreiver.java:78) at com.era7.bioinfo.annotation.FillDataFromUniprot.main(FillDataFromUniprot.java:90) at com.era7.bioinfo.annotation.FillDataFromUniprot.execute(FillDataFromUniprot.java:48) at com.era7.lib.bioinfo.bioinfoutil.ExecuteFromFile.main(ExecuteFromFile.java:66) at com.era7.bioinfo.annotation.BG7.main(BG7.java:32)

Output-

... ... i = 15 currentValue = 527 gene = Q8X7U5 completed! Realizando la peticion post... i = 0 currentValue = Deleted.

This is generated using the most recent test data (EHEC_ReferenceProteins_17_08_2012.fasta)

pablopareja commented 11 years ago

Hi Mitchell,

Thanks for opening the issue. Your problem seems to be related to a deleted Uniprot entry. The program FillDataFromUniprot crashes when trying to parse an erroneous result coming from the Uniprot Web Services.

You could identify the problematic entry accession searching for the predicted gene right after Q8X7U5 in your predictedgenes XML file (the one you're providing as input to the program FillDataFromUniprot )_

Let me know if this helps.

Cheers,

Pablo

mscook commented 11 years ago

Thanks. I'm re-running now. I'm still a little confused.

Do you mean, find the predicted gene (in predicted_genes XML ), see if it's in EHEC_ReferenceProteins_17_08_2012.fasta and remove from there? Or do I remove it from predicted_genes XML and some how re-initilise the run?

Is there not a way to handle this progamatically?

Cheers

Mitct

pablopareja commented 11 years ago

Hi Mitchell,

I just committed a new version of the program 'FillDataFromUniprot' including a fix for this sort of situations.

The program shouldn't crash anymore when getting absurd information from the Uniprot WS but rather move to the next gene/protein and simply leave the troublesome one empty.

Then it will be the user who will have to decide what to do with the problematic Uniprot entry.

Cheers,

Pablo