Tagging of unseen files not working

lujea commented 7 years ago

Dear NeuroNER authors,

I have downloaded and installed the NeuroNER project as well as the necessary dependencies (tensorflow, python3, etc.). Note that I am using python 3.6.

When I try to apply an existing model on the sample of documents provided I have an error: Command launched: python3 main.py --parameters_filepath=./parameters.ini --train_model=False --use_pretrained_model=True --dataset_text_folder=../data/example_unannotated_texts --pretrained_model_folder=../trained_models/conll_2003_en

Error: Traceback (most recent call last): File "main.py", line 446, in main() File "main.py", line 268, in main parameters, conf_parameters = load_parameters(arguments['parameters_filepath'], arguments=arguments) File "main.py", line 119, in load_parameters pretraining_parameters = load_parameters(parameters_filepath=os.path.join(parameters['pretrained_model_folder'], 'parameters.ini'), verbose=False)[0] File "main.py", line 119, in load_parameters pretraining_parameters = load_parameters(parameters_filepath=os.path.join(parameters['pretrained_model_folder'], 'parameters.ini'), verbose=False)[0] File "main.py", line 119, in load_parameters pretraining_parameters = load_parameters(parameters_filepath=os.path.join(parameters['pretrained_model_folder'], 'parameters.ini'), verbose=False)[0] [Previous line repeated 980 more times] File "main.py", line 93, in load_parameters nested_parameters = utils.convert_configparser_to_dictionary(conf_parameters) File "/home/netmail/neuroNER/NeuroNER-master/src/utils.py", line 105, in convert_configparser_to_dictionary my_config_parser_dict = {s:dict(config.items(s)) for s in config.sections()} File "/home/netmail/neuroNER/NeuroNER-master/src/utils.py", line 105, in my_config_parser_dict = {s:dict(config.items(s)) for s in config.sections()} File "/home/netmail/.local/lib/python3.6/configparser.py", line 858, in items return [(option, value_getter(option)) for option in d.keys()] File "/home/netmail/.local/lib/python3.6/configparser.py", line 858, in return [(option, value_getter(option)) for option in d.keys()] File "/home/netmail/.local/lib/python3.6/configparser.py", line 855, in section, option, d[option], d) File "/home/netmail/.local/lib/python3.6/configparser.py", line 394, in before_get self._interpolate_some(parser, option, L, value, section, defaults, 1) File "/home/netmail/.local/lib/python3.6/configparser.py", line 407, in _interpolate_some rawval = parser.get(section, option, raw=True, fallback=rest) File "/home/netmail/.local/lib/python3.6/configparser.py", line 781, in get d = self._unify_values(section, vars) File "/home/netmail/.local/lib/python3.6/configparser.py", line 1149, in _unify_values return _ChainMap(vardict, sectiondict, self._defaults) File "/home/netmail/vens/tensorflow/lib/python3.6/collections/init.py", line 874, in init self.maps = list(maps) or [{}] # always at least one map RecursionError: maximum recursion depth exceeded while calling a Python object

It seems that it fails to load the parameters from the config file.

Franck-Dernoncourt commented 7 years ago

Have you changed any of the NeuroNER files?

lujea commented 7 years ago

I only changed the parameters.ini file to set train_model to False in the mode section:

train_model = False use_pretrained_model = True pretrained_model_folder = ../trained_models/conll_2003_en

Franck-Dernoncourt commented 7 years ago

Does running python3 main.py works?

lujea commented 7 years ago

It fails if I run it with the updated parameters.ini file but it does work if I run it with the default file.

lujea commented 7 years ago

These are the only two lines I changed in the file (left hand side is default, and right hand side is the new file): train_model = True | train_model = False use_pretrained_model = False | use_pretrained_model = True

Franck-Dernoncourt commented 7 years ago

Have you changed NeuroNER/trained_models/conll_2003_en/parameters.ini by any chance? It looks like you have use_pretrained_model = True there, whereas it should be use_pretrained_model = False.

lujea commented 7 years ago

Yes I did changed that file as well.

Franck-Dernoncourt commented 7 years ago

This explains that the issue. This file shouldn't be changed. Thanks for reporting the issue, we'll add some more user friendly error message. Please let me know if that fixes your issue.

lujea commented 7 years ago

Thanks for the reply I will try not changing the parameters.ini in the trained_models folder.

I got confused by this part of the documentation: The following parameters in the src/parameters.ini configuration file must also be set to the same values as in the configuration file located in the specified pretrained_model_folder

lujea commented 7 years ago

I have revert the changes in NeuroNER/trained_models/conll_2003_en/parameters.ini so that it is now: train_model = True use_pretrained_model = False

The file in /src/parameters.ini contains: train_model = False use_pretrained_model = True

When I execute: python3 main.py --train_model=False --use_pretrained_model=True --dataset_text_folder=../data/example_unannotated_texts --pretrained_model_folder=../trained_models/conll_2003_en

The error that I get now is: NeuroNER version: 1.0-dev TensorFlow version: 1.1.0 NeuroNER version: 1.0-dev TensorFlow version: 1.1.0 {'character_embedding_dimension': 25, 'character_lstm_hidden_state_dimension': 25, 'check_for_digits_replaced_with_zeros': 1, 'check_for_lowercase': 1, 'dataset_text_folder': '../data/example_unannotated_texts', 'debug': 0, 'dropout_rate': 0.5, 'experiment_name': 'test', 'freeze_token_embeddings': 0, 'gradient_clipping_value': 5.0, 'learning_rate': 0.005, 'load_only_pretrained_token_embeddings': 0, 'main_evaluation_mode': 'conll', 'maximum_number_of_epochs': 100, 'number_of_cpu_threads': 8, 'number_of_gpus': 0, 'optimizer': 'sgd', 'output_folder': '../output', 'parameters_filepath': './parameters.ini', 'patience': 10, 'plot_format': 'pdf', 'pretrained_model_folder': '../trained_models/conll_2003_en', 'reload_character_embeddings': 1, 'reload_character_lstm': 1, 'reload_crf': 1, 'reload_feedforward': 1, 'reload_token_embeddings': 1, 'reload_token_lstm': 1, 'remap_unknown_tokens_to_unk': 1, 'spacylanguage': 'en', 'tagging_format': 'bioes', 'token_embedding_dimension': 100, 'token_lstm_hidden_state_dimension': 100, 'token_pretrained_embedding_filepath': '../data/word_vectors/glove.6B.100d.txt', 'tokenizer': 'spacy', 'train_model': 0, 'use_character_lstm': 1, 'use_crf': 1, 'use_pretrained_model': 1, 'verbose': 0} Checking compatibility between CONLL and BRAT for deploy_spacy set ... Done. Checking validity of CONLL BIOES format... Done. Load dataset... Traceback (most recent call last): File "main.py", line 446, in main() File "main.py", line 274, in main dataset.load_dataset(dataset_filepaths, parameters) File "/home/netmail/neuroNER/NeuroNER-master/src/dataset.py", line 291, in load_dataset self.alphabet_size = max(self.index_to_character.keys()) + 1 ValueError: max() arg is an empty sequence

heri commented 7 years ago

I had the same error as @lujea on a fresh install

spacy didn't have the module en installed and it created 0 bytes files: deploy_spacy.txt, deploy_spacy_bioes.txt in directory example_unannotated_texts and also another empty file news.ann in the deploy directory.

When I tried again after installing the module en, it showed error.

Deleting these empty files made it work.

I think this project needs to list all the necessary dependencies. It's not just tensorflow or Brat, you also need to install:

spacy
spacy modules, esp. english
sklearn
matplotlib (amongst others, if you haven't installed them yet, otherwise running main.py fails)

lujea commented 7 years ago

@heri thanks very much for the feedback, deleting the empty files solved the issue.

I agree with you they are missing dependencies that are not specified. I had already solved the dependencies issue, which allowed me to train the model. The problem was really the empty generated files in the example_unannotated_texts folder and the news.ann.

Franck-Dernoncourt commented 7 years ago

Thanks for the feedback!

spacy didn't have the module en installed and it created 0 bytes files

Interesting, I had erroneously assumed spacy would throw some error message instead of being mute. I'll add some more user-friendly message there.

I agree with you they are missing dependencies that are not specified.

All the dependencies should be specified in https://github.com/Franck-Dernoncourt/NeuroNER/blob/master/install_mac.md, https://github.com/Franck-Dernoncourt/NeuroNER/blob/master/install_ubuntu.md, or https://github.com/Franck-Dernoncourt/NeuroNER/blob/master/install_windows.md

We will add a requirements.txt at some point.

Franck-Dernoncourt / NeuroNER

Tagging of unseen files not working #18