USPTO / PatentPublicData

Utility tools to help download and parse patent data made available to the public
Other
182 stars 80 forks source link

Downloader date range #48

Closed patricknee closed 7 years ago

patricknee commented 7 years ago

Bulkdownloader does not do inclusive comparisons on date ranges. Requesting files for 1985 would intuitively accept the parameter:

--date 19850101-19851231

However BulkDownloader will miss the Jan 1 and December 31 files. One or the other problem (1st or 31st) occurs at least in the years 80, 85, 91, 96, 02.

Cumbersome workaround is to use the parameter:

--date 19841231-19860101

Clearly not an elegant solution.

The DateRange code is in gov.uspto.common.DateRange, but I am not sure what other code assumes non-inclusive comparisons.

bgfeldm commented 7 years ago

I see the issue. Looking at the class, I am using an overloaded constructor one which accepts Date and another accepts new Java 8 LocalDate, the first one pads and extra day at the beginning and end, the second one is not. The second one is called when parsing the date from the BulkData class, which is what your seeing. I will fix this inconsistency.

patricknee commented 7 years ago

Confirmed fixed.

patricknee commented 7 years ago

The problem was catching dates right at the end of a period - 12/31 for example.

There are two soutions: do an inclusive comparison (e.g., dates <= endDate) or to add one date to the end date and do an exclusive comparison (e.g., dates < (endDate +1)). This fix uses the second solution.

The problem is that when using the Downloader to get 2017 (i.e., passing in 20170101-201712031), this looks for the dates 20170101-20180101, requesting the 2018 html page at https://bulkdata.uspto.gov/data2/patent/grant/redbook/fulltext/2018/. This request gives a 404 error.

A better solution would be an inclusive comparison.

patricknee commented 7 years ago

I am currently confirming I am up-to-date with this repo. Sorry, no need to investigate yet.