Add USPTO Schema - Githubissues

AggelosMargkas commented 1 year ago

Make Alexandria3k fully access all the published US patent bibliographic data from 2005 to now (September 2023).

New tables : usp_cpc_classifications, usp_field_of_classification, usp_agents, usp_applicants, usp_inventors, usp_assignees, usp_patent_family, usp_citations, usp_related_documents

Add new cursors for tables for some tables that column filling couldn't be done with a simple getter function and had to cope with different versions of DTD.

New cursors: PatentsCpcCursor, PatentsRelatedDocumentsCursor, PatentsAssigneesCursor

Add new TableMeta object USPartiesTableMeta that contains similar columns of tables usp_inventors, usp_applicants, usp_agents to avoid duplicates. These three tables appear in the DTD under the hood of us-parties element and share many properties. Thus, the name the USPartiesTableMeta.

Changed the file reading of uspto.py to fit how the bulk data are provided. The reading now reads through a folder that represents a whole year and includes inside all the weekly published patents from the US office.

Respectively changed the test dataset and its reading through the test files.

Change PatentsIcprCursor to PatentsDetailsCursor , since it applies to various tables and not only the icpr_classifications table, now changed to ups_icpr_classifications.

Add one helping function alternative_path_getter :

alternative_path_getter : takes two paths as input and checks if the first yields results if not chooses the second path. Added this function for tables that existed before 2012 with a different name and all the elements under them were the same.

Removed some properties of the us_patents table, after I run a COUNT query over all the dataset and returned 0. Removed columns: microform_number, hague_filing_date, hague_reg_pub_date, hague_reg_date, sir_flag

Updated the relationship of tables under us_patents into the uspto.dot file.

Add tests for the new tables, testing that the record counted both with partition and without are the same as the entries in the sample dataset.

Fixed a double space in orcid.py.

AggelosMargkas commented 1 year ago

A thumbs up for done is fine

Thank you, I am on it!

dspinellis commented 1 year ago

Well done!

dspinellis / alexandria3k

Add USPTO Schema #24