MICommunity / psimi

Automatically exported from code.google.com/p/psimi
Creative Commons Attribution 4.0 International
5 stars 3 forks source link

mitab: split encapsulated parethesis for species #10

Closed arnaudceol closed 8 years ago

arnaudceol commented 8 years ago

The mitab parser will fail if the name of the organism contains parenthesis. This is the case for instance with strains of bacteria.

e.g. (from DIP mitab export): taxid:246196(Mycobacterium smegmatis (strain ATCC 700084 / mc155)) (I don't think this is illegal accroding to mitab standard).

The error message is:

Exception in thread "main" java.lang.RuntimeException: Error while reading the file.
    at psidev.psi.mi.tab.PsimiTabIterator.hasNext(PsimiTabIterator.java:120)
    at it.iit.genomics.cru.mi.psicquic.compile.CreateMITab.main(CreateMITab.java:238)
Caused by: psidev.psi.mi.tab.PsimiTabException: Exception parsing line :[dip:DIP-61399N|uniprotkb:A0QNK4, dip:DIP-61399N|uniprotkb:A0QNK4, -, -, -, -, MI:0114(x-ray crystallography)|MI:0114(x-ray crystallography), -, pubmed:25684576|pubmed:DIP-17441S|pubmed:25684576|pubmed:DIP-17441S, taxid:246196(Mycobacterium smegmatis (strain ATCC 700084 / mc155)), taxid246196(Mycobacterium smegmatis (strain ATCC 700084 / mc155)), MI:0407(direct interaction)|MI:0407(direct interaction), MI:0465(dip), dip:DIP-198736E, dip-quality-status:core]
    at psidev.psi.mi.tab.PsimiTabReader.handleError(PsimiTabReader.java:265)
    at psidev.psi.mi.tab.PsimiTabReader.readLine(PsimiTabReader.java:233)
    at psidev.psi.mi.tab.PsimiTabIterator.hasNext(PsimiTabIterator.java:109)
    ... 1 more
Caused by: psidev.psi.mi.tab.model.builder.IllegalFormatException: String cannot be parsed to create a organism (check the syntax): [taxid:246196(Mycobacterium smegmatis (strain ATCC 700084 / mc155))]
    at psidev.psi.mi.tab.model.builder.MitabParserUtils.splitOrganism(MitabParserUtils.java:327)
    at psidev.psi.mi.tab.model.builder.MitabParserUtils.buildBinaryInteraction(MitabParserUtils.java:196)
    at psidev.psi.mi.tab.PsimiTabReader.readLine(PsimiTabReader.java:230)
    ... 2 more

The problem is raised during the splitting of the fields. A solution would be to ensure that the separators are closed in case of parenthesis or other brackets.

colin-combe commented 8 years ago

Closing - fixed by pull request #11, which merged patch from @arnaudceol . Thanks Arnaud