USPTO / PatentPublicData

Utility tools to help download and parse patent data made available to the public
Other
180 stars 81 forks source link

Bulk data server stopped accepting byte ranges #131

Closed johnmay closed 1 year ago

johnmay commented 1 year ago

Recently the USPTO bulk data server stopped accepting byte range requests. This feature was convenient because it was possible to download just a single patent (and associated data) taking much less space/bandwidth.

Example:

curl -v -r 75428323-75548735 https://bulkdata.uspto.gov/data/patent/application/redbook/2022/I20220908.tar -output US20220281887A1.zip

The server sends by HTTP status 200 (instead of 206), reports it accepts ranges then sends back 5GB instead of 120KB.

< HTTP/1.1 200 OK
< Accept-Ranges: bytes
< Content-Length: 5245327360

The Google mirror <2015 shows how this is meant to work:

curl -v -r 1065965668-1065996692 http://storage.googleapis.com/patents/redbook/applications/2007/I20070607.ZIP -o US2007129372_output
< HTTP/1.1 206 Partial Content
...
< Accept-Ranges: bytes
< Content-Range: bytes 1065965668-1065996692/1358212086
< Content-Length: 31025

Hoping someone on the technical side at USPTO knows what has happened, be it a server upgrade/reconfiguration etc?

John

johnmay commented 1 year ago

Now resolved, not sure if it was because of this or not but thanks if so!

< Accept-Ranges: bytes
< Content-Length: 120413