USPTO / PatentPublicData

Utility tools to help download and parse patent data made available to the public
Other
182 stars 80 forks source link

NullPointerException while loading RedbookApplication zip #70

Closed Howiezhu closed 6 years ago

Howiezhu commented 6 years ago

Hello;

I just tried to run PatentExtractor but using gov.uspto.bulkdata.cli.ExtractPatent --source="C:\Users\test\Desktop\patents\ipa180607.zip" --skip 0 --limit 5 --outdir="download" --aps=false. It doesn't seem to like my source? since it is throwing a nullpointer exception. 2018-06-21 13:08:40,392 INFO [main] PatentDocFormatDetect - PatentDocFormat fromFileName: RedbookApplication 2018-06-21 13:08:40,405 INFO [main] ZipReader - Reading zip file: C:\Users\test\Desktop\patents\ipa180607.zip 2018-06-21 13:08:40,459 INFO [main] ZipReader - Found 1 file[FileFilter [matchRules=[SuffixFileFilter(xml)]]]: ipa180607.xml 2018-06-21 13:08:40,463 INFO [main] PatentDocFormatDetect - PatentType fromContent: RedbookApplication Exception in thread "main" java.lang.NullPointerException at java.lang.String.startsWith(String.java:1405) at java.lang.String.startsWith(String.java:1434) at gov.uspto.patent.bulk.DumpFileXml.isStartTag(DumpFileXml.java:100) at gov.uspto.patent.bulk.DumpFileXml.read(DumpFileXml.java:46) at gov.uspto.patent.bulk.DumpFile.open(DumpFile.java:72) at gov.uspto.patent.bulk.DumpFileXml.open(DumpFileXml.java:31) at gov.uspto.bulkdata.cli.ExtractPatent.main(ExtractPatent.java:210)

Howiezhu commented 6 years ago

The issue is in DumpFileXml.java. in open(). the xmlStartTag and xmlEndTag is not initialized before read happens.