File ~\anaconda3\lib\site-packages\news_extract\news_extract.py:176, in factiva_extract(article_fn)
173 article_dict3 = dict(zip(field_names3,fields3))
175 if 'LP' in article_dict2:
--> 176 article_dict3['TXT'] += article_dict2['LP']
177 del article_dict2['LP']
178 if 'TD' in article_dict2:
KeyError: 'TXT'
This is the full code that I use for extracting news from a set of files in a folder:
import news_extract as ne
import os, glob
import pandas
directory = '.../FactivaDownloads/'
filenames_fc = []
for filename in glob.glob(os.path.join(directory, "*.txt")):
print(filename)
with open(os.path.join(os.getcwd(), filename), 'r') as f:
filenames_fc += [filename]
Did you figure out a solution? I am getting the same error, it seems that something in the way factiva encodes the output has changed - but it only happens with some files, not others.
Hi,
I get the following error when I use news_extract to read the file below 3634_08_b1.txt
KeyError Traceback (most recent call last) Cell In[28], line 21 19 print(n+1) 20 fc_file = filenames_fc[n] #file exported from Factiva ---> 21 fc_data += ne.factiva_extract(fc_file) 23 data=ne.fix_fac_fieldnames(fc_data) 24 dataframe=ne.news_export(data,to_pandas=True, master_fields = [], jacc_threshold=1.1)
File ~\anaconda3\lib\site-packages\news_extract\news_extract.py:176, in factiva_extract(article_fn) 173 article_dict3 = dict(zip(field_names3,fields3)) 175 if 'LP' in article_dict2: --> 176 article_dict3['TXT'] += article_dict2['LP'] 177 del article_dict2['LP'] 178 if 'TD' in article_dict2:
KeyError: 'TXT'
This is the full code that I use for extracting news from a set of files in a folder:
import news_extract as ne import os, glob import pandas
directory = '.../FactivaDownloads/'
filenames_fc = []
for filename in glob.glob(os.path.join(directory, "*.txt")): print(filename) with open(os.path.join(os.getcwd(), filename), 'r') as f: filenames_fc += [filename]
fc_data = []
for n in range(0, len(filenames_fc)):
for n in range(0,2):
data=ne.fix_fac_fieldnames(fc_data) dataframe=ne.news_export(data,to_pandas=True, master_fields = [], jacc_threshold=1.1) dataframe.to_excel(directory + "News.xlsx")
I would appreciate your assistance for solving the issue. Thank you, Best, Birgul