ADAH-EviDENce / NewsReader

Docker build of full NewsReader pipeline in Dutch.
Apache License 2.0
2 stars 4 forks source link

Opinion_miner can't find token_id's #34

Open wmkouw opened 6 years ago

wmkouw commented 6 years ago

Goal: run opinion miner on input naf file.

Input:

(cat "$fn-cor.naf" | python2 $OPI/tag_file.py -f $OPI/models/models_news_nl/ > "$fn-opi.naf" 2> "$fn-opi.log") 2> "$fn-opi.err"

Problem: can't find certain token_id's (corresponding to word id)

Stack trace:

Traceback (most recent call last):
  File "/opinion_miner_deluxePP/tag_file.py", line 175, in <module>
    feature_file = expression_feature_extractor(kaf_naf_obj,'tag', model_folder, log=args.log)
  File "/opinion_miner_deluxePP/extract_features_expression.py", line 581, in main
    create_sequence(naf_obj, sentence_id, overall_parameters, list_opinions = [],output = output_fd, log=log)
  File "/opinion_miner_deluxePP/extract_features_expression.py", line 283, in create_sequence
    feature_labels = extract_terms_pos(naf_obj,token_ids, features)
  File "/opinion_miner_deluxePP/extract_features_expression.py", line 108, in extract_terms_pos
    term_id = naf_obj.termid_for_tokenid[token_id]
KeyError: 'w4747'
wmkouw commented 6 years ago

KeyError also randomly occurring for alpinonaf

wmkouw commented 6 years ago

Problem is most likely due to missing xml files (see https://github.com/ADAH-EviDENce/NewsReader/issues/28). I've passed the list without the missing files through, but there are probably still dictionary keys referring to the missing files.