USPTO / PatentPublicData

Utility tools to help download and parse patent data made available to the public
Other
180 stars 81 forks source link

transformer tool java.lang.StringIndexOutOfBoundsException #99

Closed sotnikov-s closed 4 years ago

sotnikov-s commented 4 years ago

I've run into the following issue: when formatting pftaps19810707_wk27.zip file with parameters --type=json --outDir=. --outBulk=false --prettyPrint=true I got an error:

Exception in thread "main" java.lang.StringIndexOutOfBoundsException: begin 0, end -1, length 14
        at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3720)
        at java.base/java.lang.String.substring(String.java:1909)
        at gov.uspto.patent.OrgSynonymGenerator.suffixWord(OrgSynonymGenerator.java:275)
        at gov.uspto.patent.OrgSynonymGenerator.suffix(OrgSynonymGenerator.java:149)
        at gov.uspto.patent.OrgSynonymGenerator.computeSynonyms(OrgSynonymGenerator.java:83)
        at gov.uspto.patent.serialize.JsonMapperStream.writeName(JsonMapperStream.java:379)
        at gov.uspto.patent.serialize.JsonMapperStream.writeEntity(JsonMapperStream.java:327)
        at gov.uspto.patent.serialize.JsonMapperStream.output(JsonMapperStream.java:127)
        at gov.uspto.patent.serialize.JsonMapperStream.write(JsonMapperStream.java:97)
        at gov.uspto.bulkdata.tools.transformer.TransformerRecordProcessor.writeOutputType(TransformerRecordProcessor.java:142)
        at gov.uspto.bulkdata.tools.transformer.TransformerRecordProcessor.process(TransformerRecordProcessor.java:90)
        at gov.uspto.bulkdata.RecordReader.read(RecordReader.java:195)
        at gov.uspto.bulkdata.RecordReader.read(RecordReader.java:122)
        at gov.uspto.bulkdata.RecordReader.read(RecordReader.java:85)
        at gov.uspto.bulkdata.RecordReader.read(RecordReader.java:43)
        at gov.uspto.bulkdata.cli.Transformer.exec(Transformer.java:77)
        at gov.uspto.bulkdata.cli.Transformer.main(Transformer.java:115)

hint: the problem is somewhere around the 372 patent document, cause the same command with --limit 371 works out, but with --limit 372 doesn't