USPTO / PatentPublicData

Utility tools to help download and parse patent data made available to the public
Other
188 stars 80 forks source link

Cannot parse 1980 Grant to JSON - null pointer #7

Closed patricknee closed 7 years ago

patricknee commented 7 years ago

Freshly downloaded (with BulkDownloader) file from 1980 crashes with a NULL pointer:

$ java -cp PatentDocument/target/*:PatentDocument/target/dependency-jars/* gov.uspto.patent.TransformerCli --input="BulkDownloader/download/grants/1980/pftaps19800101_wk01.zip" --outdir="BulkDownloader/download/grants/1980/expanded/" --limit=10
log4j:WARN No appenders could be found for logger (gov.uspto.patent.TransformerCli).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" java.lang.NullPointerException
    at gov.uspto.patent.bulk.DumpFile.close(DumpFile.java:82)
    at gov.uspto.patent.TransformerCli.processDumpFile(TransformerCli.java:192)
    at gov.uspto.patent.TransformerCli.process(TransformerCli.java:115)
    at gov.uspto.patent.TransformerCli.main(TransformerCli.java:270)
bgfeldm commented 7 years ago

I checked in a fix for TransformerCli not reading Greenbook patents dumps.

I was not passing in a file filter, also some Greenbook patents don't have an abstract.

On the bulk file your processing the first 10 patent documents do not have an abstract, but when I jump to the 100th document, it does have an abstract.

patricknee commented 7 years ago

TransformerCli spot tested extracting from downloaded files from 1980, 1990, 2000, 2010, all will success. Thanks for the fix.