Both the preprocessor and the disambiguation code will get forked into Fung Institute github repos in late 2012 or 2013. The repos above are the best ones to fork for contributing at the moment.
Please use the following for citing use of this data:
Ronald Lai; Alexander D'Amour; Amy Yu; Ye Sun; David M. Doolin, Lee Fleming, 2013,
"Disambiguation and Co-authorship Networks of the U.S. Patent Inventor
Database (1975 - 2013)"
An extensively updated paper is available via Harvard Patent Dataverse: Disambiguation and Co-authorship Networks of the U.S. Patent Inventor Database
Inventor Disambiguation through 12-31-2013:
SQLite databases uploaded after November 17, 2012 will reflect schema updates with respect to the data originally distributed via the Harvard Dataverse Network. These schema updates will transition to SQL standards compliant, as implemented by PostgresQL. Schema migration is likely to continue through 2013 as the data are prepared and delivered for access via a web-accessible API. When in doubt about schemas, examine specific file timestamps.
Also, note that many of the tables in these SQLite3 database files are consolidated. That is, each table will contain data parsed from the patent documents provided by USPTO and Google, and may contain data joined from other tables, third party sources (geocoding), or transformed from the original encoding into ASCII text.
Also, tables may or may not be indexed appropriately for your needs. Distributing tables without indexing saves on bandwidth and download time.
---
Note: there are RSS feeds available for all the repositories.
Feel free to follow along in the RSS reader of your choice.
The SQLite3 databases generally lack indexing to save on download
time and expense. The user is expected to index according to
the needs of the analysis.