TamerKhraisha / uspto-patent-data-parser

A python tool for reading, parsing and finding patent using the United States Patent and Trademark (USPTO) Bulk Data Storage System.
MIT License
37 stars 10 forks source link

Parse 1998 data error #5

Open GengYuIsland opened 3 months ago

GengYuIsland commented 3 months ago

Hello coder, when I try to parse the data of 1998, there's an error, the func "def get_patents_list" will return a null list, if I change the code to this:

def get_patents_list(patents_txt_data):
    patents_data = []
    current_patent = []
    for line in patents_txt_data[1:]:
        cleaned_line = ' '.join(line.split())
        if cleaned_line.startswith('PATN'):
            if current_patent:
                patents_data.append(current_patent)
                current_patent = []
            current_patent.append(cleaned_line)
        else:
            current_patent.append(cleaned_line)
    if current_patent:
        patents_data.append(current_patent)
    for i in range(len(patents_data)):
        patent = patents_data[i]
        patents_data[i] = [[word for word in line.split() if word] for line in patent]
    return patents_data

Then It works. However, it only fits 1998, when I try to use the new func to parse 1999, it didn't. I guess you must didn't test all the years, so can you help me to solve this problem and make the code more strong? Thank you a lot.