Open sotnikov-s opened 4 years ago
I am not able to reproduce the error. Instead the software throws an warning for the record. The code was last edited July 2019, have you updated and rebuilt since then?
2019-12-23 08:18:40,074 INFO [ main] pftaps19760203_wk05.zip:200:US3935790A RecordReader - ... mark pftaps19760203_wk05.zip:200
2019-12-23 08:18:40,445 WARN [ main] pftaps19760203_wk05.zip:238:US3935828A ClassificationNode - Failed to Parse locarno IPC Classification: '114' from :
I pulled the repo a week ago and use the master branch as the source. and I just retried it with pulling from master and build the .jar anew. could you please affirm my building steps are right?
mvn package
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:3.1.1:single (default) on project BulkDownloader: Error reading assemblies: Error locating assembly descriptor: resources/assembly/bin.xml
mvn package
again with success resultcd BulkDownloader/target/dependency-jars
mkdir dependencies
unzip \*.jar -d dependencies/
cd dependencies
jar cf dependency.jar *
cd BulkDownloader/target/classes
Manifest-Version: 1.0
Created-By: 1.7.0_06 (Oracle Corporation)
Main-Class: gov.uspto.bulkdata.cli.Transformer
Class-Path: dependency.jar
jar cfm transformer.jar manifest.txt gov/*
java -jar transformer.jar -f=pftaps19760203_wk05.zip --type=json --outDir=. --outBulk=false --prettyPrint=true
After pulling the repo run "maven clean package" from the top level directory (not from within BulkDownloader). If test are failing, you can skip test with "maven clean package -DskipTests=true". If the maven build is successful a zip file should be created under target; "./target/PatentPublicData-0.0.1-SNAPSHOT-*.zip".
If the maven build is successful a zip file should be created under target; "./target/PatentPublicData-0.0.1-SNAPSHOT-*.zip".
The build was successful, the snapshot file has been created. But the same .jar creation algorithm leads to the same null pointer exception. And the same exception at the same file point also happens if I avoid all custom algorithms and run the tool as it is, with no hand made .jar files:
cd PatentPublicData/BulkDownloader
java -cp "target/BulkDownloader-0.0.1-SNAPSHOT.jar:target/dependency-jars/*" gov.uspto.bulkdata.cli.Transformer -f="pftaps19760203_wk05.zip" --type="json" --outDir="." --matching-xml=true --outBulk=false
log4j:WARN No appenders could be found for logger (gov.uspto.patent.PatentDocFormatDetect).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" java.lang.NullPointerException
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:878)
at com.google.common.base.Strings.padStart(Strings.java:90)
at gov.uspto.patent.model.classification.LocarnoClassification.getTree(LocarnoClassification.java:72)
at gov.uspto.patent.serialize.JsonMapperStream.writeSingleClassificationType(JsonMapperStream.java:700)
at gov.uspto.patent.serialize.JsonMapperStream.writeClassifications(JsonMapperStream.java:561)
at gov.uspto.patent.serialize.JsonMapperStream.output(JsonMapperStream.java:145)
at gov.uspto.patent.serialize.JsonMapperStream.write(JsonMapperStream.java:97)
at gov.uspto.bulkdata.tools.transformer.TransformerRecordProcessor.writeOutputType(TransformerRecordProcessor.java:142)
at gov.uspto.bulkdata.tools.transformer.TransformerRecordProcessor.process(TransformerRecordProcessor.java:90)
at gov.uspto.bulkdata.RecordReader.read(RecordReader.java:195)
at gov.uspto.bulkdata.RecordReader.read(RecordReader.java:122)
at gov.uspto.bulkdata.RecordReader.read(RecordReader.java:85)
at gov.uspto.bulkdata.RecordReader.read(RecordReader.java:43)
at gov.uspto.bulkdata.cli.Transformer.exec(Transformer.java:77)
at gov.uspto.bulkdata.cli.Transformer.main(Transformer.java:115)
The exception is being thrown at a concrete file point. In this case (file pftaps19760203_wk05.zip), it's 238-th (see --skip and --limit flags) patent info:
Wherein 236 and 238 documents are being parsed successfully. The same error happens also when parsing 373-rd patent in the same file:
Sure there are more such cases.