TamerKhraisha / uspto-patent-data-parser

A python tool for reading, parsing and finding patent using the United States Patent and Trademark (USPTO) Bulk Data Storage System.
MIT License
37 stars 10 forks source link

Convert doc-number to patent number? #2

Open victorconan opened 2 years ago

victorconan commented 2 years ago

I noticed the parser returned the doc-number rather than patent number for the patents. Although one can search a patent using doc-number, I cannot find a mapping for doc-number vs. patent number. Do you know how to get the patent number? Thanks!

TamerKhraisha commented 2 years ago

Hi @victorconan Thank you for reporting this issue, I will investigate your request in the coming days and will get back to you. I am sure the patent number is in the document and could be extracted

victorconan commented 2 years ago

Hi @victorconan Thank you for reporting this issue, I will investigate your request in the coming days and will get back to you. I am sure the patent number is in the document and could be extracted

I looked at the xml file, and it seems they used doc-number and didn't distinguish whether it is patent number or something else :/ But the tag section puts publication-reference and application-reference in it:

<us-bibliographic-data-grant>
<publication-reference>
<document-id>
<country>US</country>
<doc-number>D0939807</doc-number>
<kind>S1</kind>
<date>20220104</date>
</document-id>
</publication-reference>
<application-reference appl-type="design">
<document-id>
<country>US</country>
<doc-number>29667332</doc-number>
<date>20181019</date>
</document-id>
</application-reference>

My guess is D0939807 from publication-reference is a patent number with extra 0 after D (not sure why, it is weird). And 29667332 from application-reference is application number. I think the parser only parses the latter one?

victorconan commented 2 years ago

I think the bug is here:

def get_patent_identification_data(root_tree):
    publication_info = root_tree.find(publication_info_base_path)
    application_info = root_tree.find(application_info_base_path)
    term_of_grant_info = root_tree.find(us_term_of_grant_path)
    term_of_grant_length = root_tree.find(us_term_of_grant_length)
    term_of_grant_extension = root_tree.find(us_term_of_grant_extension)
    us_term_of_grant_disclaimer = root_tree.find(us_term_of_grant_disclaimer_text)
    invention_title = root_tree.find(invention_title_path)
    document_data = {}    
    if publication_info != None:
        publication_reference_info = {element.tag: element.text for element in list(publication_info)}
        document_data = {**document_data,**publication_reference_info}
    if application_info !=None:
        application_reference_info = {element.tag: element.text for element in list(application_info)}
        if application_info.attrib and application_info.attrib['appl-type']:
            application_reference_info['application_type'] =  application_info.attrib['appl-type']
        document_data = {**document_data,**application_reference_info}

Here if a patent has application info, then the publication info will be overwritten.

federiconuta commented 11 months ago

Hi all, sorry for jumping into the conversation. maybe a workaround on this is to rely on google patents api in order to convert the doc-number to a patent number. Cheers