TamerKhraisha / uspto-patent-data-parser

A python tool for reading, parsing and finding patent using the United States Patent and Trademark (USPTO) Bulk Data Storage System.
MIT License
37 stars 10 forks source link

Suggestion about func "read_and_parse_txt_from_disk" #6

Open GengYuIsland opened 3 months ago

GengYuIsland commented 3 months ago

Moreover, I suggest this func should be changed like this, because I meet the encoding problem:

def read_and_parse_txt_from_disk(path_to_file,data_items):
    try:
        with open(path_to_file,'r',encoding='utf-8') as f:
            txt = f.read()
    except:
        with open(path_to_file,'r',encoding='latin1') as f:
            txt = f.read()
    txt = txt.split('\n')
    raw_patent_data= get_patents_list(txt)
    parsed_data = []
    for patent in raw_patent_data:
        parsed_data.append(parse_txt_patent_data(patent,data_items_list = data_items))
    return parsed_data