USPTO / PatentPublicData

Utility tools to help download and parse patent data made available to the public
Other
182 stars 80 forks source link

Support TransformerCli 'skip' option #87

Open zanerock opened 5 years ago

zanerock commented 5 years ago

The --skip option in the TransformerCli is documented at the head, but later on it's noted that it's not actually implemented. The option is useful or even necessary for processing large data sets.

zanerock commented 5 years ago

I haven't dug into this, but I suspect the reason is that Transformer is using ZipInputStream, which is inherently linear. Switching to or optionally using using ZipFile may be a relatively easy and efficient solution. Regardless, I'd may be able to take up the issue.

bgfeldm commented 5 years ago

TransformerCli was deprecated for the newer one which supports skip at gov.uspto.bulkdata.cli.Transformer

zanerock commented 5 years ago

Gotcha, good to hear. Could the deprecated class be deleted?