USPTO / PatentPublicData

Utility tools to help download and parse patent data made available to the public
Other
182 stars 80 forks source link

Transform to CSV #83

Open bgfeldm opened 5 years ago

bgfeldm commented 5 years ago

Transform into CSV of two variations

  1. Exploded CSV to load into Relational Database Tables

    • Handling multiple values
    • multiple files are created, named by their type
    • Each individual entity becomes a record row to load
  2. Flat CSV to load into Solr Index

    • Handling multiple values
      • Individual entities are grouped together as multiple values delimited by a pipe "|"
    • Fields names, by default, will use Solr's default dynamic field endings
      • create a solr core/collection and index, with less setup time, eliminating the initial need to create a solr schema

Both formats will be useful for big data processing. And most all databases have native support for loading CSV.