Add new cursors for tables for some tables that column filling couldn't be done with a simple getter function and had to cope with different versions of DTD.
New cursors: PatentsCpcCursor, PatentsRelatedDocumentsCursor, PatentsAssigneesCursor
Add new TableMeta object USPartiesTableMeta that contains similar columns of tables usp_inventors, usp_applicants, usp_agents to avoid duplicates. These three tables appear in the DTD under the hood of us-parties element and share many properties. Thus, the name the USPartiesTableMeta.
Changed the file reading of uspto.py to fit how the bulk data are provided. The reading now reads through a folder that represents a whole year and includes inside all the weekly published patents from the US office.
Respectively changed the test dataset and its reading through the test files.
Change PatentsIcprCursor to PatentsDetailsCursor , since it applies to various tables and not only the icpr_classifications table, now changed to ups_icpr_classifications.
Add one helping function alternative_path_getter :
alternative_path_getter : takes two paths as input and checks if the first yields results if not chooses the second path. Added this function for tables that existed before 2012 with a different name and all the elements under them were the same.
Removed some properties of the us_patents table, after I run a COUNT query over all the dataset and returned 0. Removed columns: microform_number, hague_filing_date, hague_reg_pub_date, hague_reg_date, sir_flag
Updated the relationship of tables under us_patents into the uspto.dot file.
Add tests for the new tables, testing that the record counted both with partition and without are the same as the entries in the sample dataset.
Make Alexandria3k fully access all the published US patent bibliographic data from 2005 to now (September 2023).
usp_cpc_classifications
,usp_field_of_classification
,usp_agents
,usp_applicants
,usp_inventors
,usp_assignees
,usp_patent_family
,usp_citations
,usp_related_documents
Add new cursors for tables for some tables that column filling couldn't be done with a simple getter function and had to cope with different versions of DTD.
PatentsCpcCursor
,PatentsRelatedDocumentsCursor
,PatentsAssigneesCursor
Add new TableMeta object
USPartiesTableMeta
that contains similar columns of tablesusp_inventors
,usp_applicants
,usp_agents
to avoid duplicates. These three tables appear in the DTD under the hood of us-parties element and share many properties. Thus, the name the USPartiesTableMeta.Changed the file reading of uspto.py to fit how the bulk data are provided. The reading now reads through a folder that represents a whole year and includes inside all the weekly published patents from the US office.
Respectively changed the test dataset and its reading through the test files.
Change
PatentsIcprCursor
toPatentsDetailsCursor
, since it applies to various tables and not only theicpr_classifications
table, now changed toups_icpr_classifications
.Add one helping function
alternative_path_getter
:alternative_path_getter
: takes two paths as input and checks if the first yields results if not chooses the second path. Added this function for tables that existed before 2012 with a different name and all the elements under them were the same.Removed some properties of the us_patents table, after I run a COUNT query over all the dataset and returned 0. Removed columns: microform_number, hague_filing_date, hague_reg_pub_date, hague_reg_date, sir_flag
Updated the relationship of tables under us_patents into the uspto.dot file.
Add tests for the new tables, testing that the record counted both with partition and without are the same as the entries in the sample dataset.
Fixed a double space in orcid.py.