USPTO / PatentPublicData

Utility tools to help download and parse patent data made available to the public
Other
180 stars 81 forks source link

Bulk converter crashes on ipg210921.zip #125

Open thomaskelder opened 2 years ago

thomaskelder commented 2 years ago

The bulk patent document converter crashes on the latest bulk download file ipg210921.zip with the following error:

Exception in thread "main" java.lang.StringIndexOutOfBoundsException: begin 0, end -1, length 27
    at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319)
    at java.base/java.lang.String.substring(String.java:1874)
    at gov.uspto.patent.OrgSynonymGenerator.chineseCompanyNames(OrgSynonymGenerator.java:367)
    at gov.uspto.patent.OrgSynonymGenerator.computeSynonyms(OrgSynonymGenerator.java:111)
    at gov.uspto.patent.serialize.JsonMapperStream.writeName(JsonMapperStream.java:379)
    at gov.uspto.patent.serialize.JsonMapperStream.writeEntity(JsonMapperStream.java:327)
    at gov.uspto.patent.serialize.JsonMapperStream.output(JsonMapperStream.java:127)
    at gov.uspto.patent.serialize.JsonMapperStream.write(JsonMapperStream.java:97)
    at gov.uspto.bulkdata.tools.transformer.TransformerRecordProcessor.writeOutputType(TransformerRecordProcessor.java:142)
    at gov.uspto.bulkdata.tools.transformer.TransformerRecordProcessor.process(TransformerRecordProcessor.java:106)
    at gov.uspto.bulkdata.RecordReader.read(RecordReader.java:195)
    at gov.uspto.bulkdata.RecordReader.read(RecordReader.java:122)
    at gov.uspto.bulkdata.RecordReader.read(RecordReader.java:85)
    at gov.uspto.bulkdata.RecordReader.read(RecordReader.java:43)
    at gov.uspto.bulkdata.cli.Transformer.exec(Transformer.java:77)
    at gov.uspto.bulkdata.cli.Transformer.main(Transformer.java:115)