dfreelon / news_extract

Python module to extract articles from NexisUni and Factiva.
BSD 3-Clause "New" or "Revised" License
36 stars 9 forks source link

KeyError: 'TXT' #4

Open drbirgularslan opened 1 year ago

drbirgularslan commented 1 year ago

Hi,

I get the following error when I use news_extract to read the file below 3634_08_b1.txt


KeyError Traceback (most recent call last) Cell In[28], line 21 19 print(n+1) 20 fc_file = filenames_fc[n] #file exported from Factiva ---> 21 fc_data += ne.factiva_extract(fc_file) 23 data=ne.fix_fac_fieldnames(fc_data) 24 dataframe=ne.news_export(data,to_pandas=True, master_fields = [], jacc_threshold=1.1)

File ~\anaconda3\lib\site-packages\news_extract\news_extract.py:176, in factiva_extract(article_fn) 173 article_dict3 = dict(zip(field_names3,fields3)) 175 if 'LP' in article_dict2: --> 176 article_dict3['TXT'] += article_dict2['LP'] 177 del article_dict2['LP'] 178 if 'TD' in article_dict2:

KeyError: 'TXT'

This is the full code that I use for extracting news from a set of files in a folder:

import news_extract as ne import os, glob import pandas

directory = '.../FactivaDownloads/'

filenames_fc = []

for filename in glob.glob(os.path.join(directory, "*.txt")): print(filename) with open(os.path.join(os.getcwd(), filename), 'r') as f: filenames_fc += [filename]

fc_data = []

for n in range(0, len(filenames_fc)):

for n in range(0,2):

print(n+1)
fc_file = filenames_fc[n]  #file exported from Factiva
fc_data += ne.factiva_extract(fc_file)

data=ne.fix_fac_fieldnames(fc_data) dataframe=ne.news_export(data,to_pandas=True, master_fields = [], jacc_threshold=1.1) dataframe.to_excel(directory + "News.xlsx")

I would appreciate your assistance for solving the issue. Thank you, Best, Birgul

lende77 commented 9 months ago

Did you figure out a solution? I am getting the same error, it seems that something in the way factiva encodes the output has changed - but it only happens with some files, not others.