Issue on predicting on new data

danilo-dessi commented 2 years ago

Hello, I get this exception on new and also the testing data. Could you help me?

`Traceback (most recent call last): File "/Users/danilodessi/anaconda3/envs/dygiepp/lib/python3.7/site-packages/allennlp/data/vocabulary.py", line 687, in get_token_index return self._token_to_index[namespace][token] KeyError: ''

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/Users/danilodessi/anaconda3/envs/dygiepp/lib/python3.7/site-packages/allennlp/data/vocabulary.py", line 690, in get_token_index return self._token_to_index[namespace][self._oov_token] KeyError: '@@UNKNOWN@@'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/Users/danilodessi/anaconda3/envs/dygiepp/bin/allennlp", line 8, in sys.exit(run()) File "/Users/danilodessi/anaconda3/envs/dygiepp/lib/python3.7/site-packages/allennlp/main.py", line 34, in run main(prog="allennlp") File "/Users/danilodessi/anaconda3/envs/dygiepp/lib/python3.7/site-packages/allennlp/commands/init.py", line 92, in main args.func(args) File "/Users/danilodessi/anaconda3/envs/dygiepp/lib/python3.7/site-packages/allennlp/commands/predict.py", line 226, in _predict manager.run() File "/Users/danilodessi/anaconda3/envs/dygiepp/lib/python3.7/site-packages/allennlp/commands/predict.py", line 193, in run for model_input_instance, result in zip(batch, self._predict_instances(batch)): File "/Users/danilodessi/anaconda3/envs/dygiepp/lib/python3.7/site-packages/allennlp/commands/predict.py", line 152, in _predict_instances results = [self._predictor.predict_instance(batch_data[0])] File "/Users/danilodessi/Documents/research/CS-KG/src/extraction/dygiepp/dygie/predictors/dygie.py", line 54, in predict_instance dataset.index_instances(model.vocab) File "/Users/danilodessi/anaconda3/envs/dygiepp/lib/python3.7/site-packages/allennlp/data/batch.py", line 159, in index_instances instance.index_fields(vocab) File "/Users/danilodessi/anaconda3/envs/dygiepp/lib/python3.7/site-packages/allennlp/data/instance.py", line 75, in index_fields field.index(vocab) File "/Users/danilodessi/anaconda3/envs/dygiepp/lib/python3.7/site-packages/allennlp/data/fields/list_field.py", line 55, in index field.index(vocab) File "/Users/danilodessi/anaconda3/envs/dygiepp/lib/python3.7/site-packages/allennlp/data/fields/list_field.py", line 55, in index field.index(vocab) File "/Users/danilodessi/anaconda3/envs/dygiepp/lib/python3.7/site-packages/allennlp/data/fields/label_field.py", line 91, in index self.label, self._label_namespace # type: ignore File "/Users/danilodessi/anaconda3/envs/dygiepp/lib/python3.7/site-packages/allennlp/data/vocabulary.py", line 695, in get_token_index f"'{token}' not found in vocab namespace '{namespace}', and namespace " KeyError: "'' not found in vocab namespace 'None__ner_labels', and namespace does not contain the default OOV token ('@@UNKNOWN@@')"`

The command is allennlp predict pretrained/scierc.tar.gz ../../../outputs/dygiepp_input/AI_1.json --predictor dygie --include-package dygie --use-dataset-reader --output-file ../../../outputs/dygiepp_output/AI_1.json

Example data contained in AI_1.json is: `{"clusters": [[], [], [], [], [], [], [], [], [], []], "sentences": [["We", "present", "the", "first", "algorithm", "for", "maintaining", "a", "maximal", "independent", "set", "(", "MIS", ")", "of", "a", "fully", "dynamic", "graph", "--", "-which", "undergoes", "both", "edge", "insertions", "and", "deletions", "--", "-in", "polylogarithmic", "time", "."], ["Our", "algorithm", "is", "randomized", "and", ",", "per", "update", ",", "takes", "O", "(", "log^2", "log^2", "n", ")", "expected", "time", "."], ["Furthermore", ",", "the", "algorithm", "can", "be", "adjusted", "to", "have", "O", "(", "log^2", "log^4", "n", ")", "worst-case", "update-time", "with", "high", "probability", "."], ["Here", ",", "n", "denotes", "the", "number", "of", "vertices", "and", "is", "the", "maximum", "degree", "in", "the", "graph", "."], ["The", "MIS", "problem", "in", "fully", "dynamic", "graphs", "has", "attracted", "significant", "attention", "after", "a", "breakthrough", "result", "of", "Assadi", ",", "Onak", ",", "Schieber", ",", "and", "Solomon", "[", "STOC'18", "]", "who", "presented", "an", "algorithm", "with", "O", "(", "m^3/4", ")", "update-time", "(", "and", "thus", "broke", "the", "natural", "(", "m", ")", "barrier", ")", "where", "m", "denotes", "the", "number", "of", "edges", "in", "the", "graph", "."], ["This", "result", "was", "improved", "in", "a", "series", "of", "subsequent", "papers", ",", "though", ",", "the", "update-time", "remained", "polynomial", "."], ["In", "particular", ",", "the", "fastest", "algorithm", "prior", "to", "our", "work", "had", "O", "(", "min", "{", "n", ",", "m^1/3", "}", ")", "update-time", "[", "Assadi", "et", "al", "."], ["Our", "algorithm", "maintains", "the", "lexicographically", "first", "MIS", "over", "a", "random", "order", "of", "the", "vertices", "."], ["As", "a", "result", ",", "the", "same", "algorithm", "also", "maintains", "a", "3-approximation", "of", "correlation", "clustering", "."], ["We", "also", "show", "that", "a", "simpler", "variant", "of", "our", "algorithm", "can", "be", "used", "to", "maintain", "a", "random-order", "lexicographically", "first", "maximal", "matching", "in", "the", "same", "update-time", "."]], "ner": [[], [], [], [], [], [], [], [], [], []], "relations": [[], [], [], [], [], [], [], [], [], []], "doc_key": "2972259791"}

{"clusters": [[], []], "sentences": [["A", "loop", "filter", "unit", "11", "carries", "out", "a", "class", "classification", "of", "a", "local", "decoded", "image", "generated", "by", "an", "adding", "unit", "9", "into", "one", "class", "for", "each", "coding", "block", "having", "a", "largest", "size", "determined", "by", "an", "encoding", "controlling", "unit", "2", "and", "also", "designs", "a", "filter", "that", "compensates", "for", "a", "distortion", "piggybacked", "for", "each", "local", "decoded", "image", "belonging", "to", "each", "class", ",", "and", "also", "carries", "out", "a", "filtering", "process", "on", "the", "above-mentioned", "local", "decoded", "image", "by", "using", "the", "filter", "."], ["A", "variable", "length", "encoding", "unit", "13", "encodes", ",", "as", "filter", "parameters", ",", "the", "filter", "designed", "by", "the", "loop", "filter", "unit", "11", "and", "used", "for", "the", "local", "decoded", "image", "belonging", "to", "each", "class", ",", "and", "a", "class", "number", "of", "each", "largest", "coding", "block", "."]], "ner": [[], []], "relations": [[], []], "doc_key": "2475240084"}

{"clusters": [[], [], [], [], [], [], [], []], "sentences": [["Linguistic", "intuitionistic", "fuzzy", "variables", "(", "LIFVs", ")", "can", "efficiently", "denote", "the", "qualitative", "preferred", "and", "non-preferred", "cognitions", "of", "decision", "makers", "."], ["This", "paper", "researches", "group", "decision", "making", "with", "linguistic", "intuitionistic", "fuzzy", "information", "."], ["To", "do", "this", ",", "several", "Hamacher", "operational", "laws", "on", "LIFVs", "are", "defined", "."], ["To", "derive", "the", "comprehensive", "evaluating", "values", "of", "alternatives", ",", "several", "linguistic", "intuitionistic", "fuzzy", "Hamacher", "aggregation", "operators", "are", "proposed", ",", "including", "the", "linguistic", "intuitionistic", "fuzzy", "Hamacher", "weighted", "average", "operator", ",", "the", "linguistic", "intuitionistic", "fuzzy", "Hamacher", "weighted", "geometric", "mean", "operator", ",", "the", "linguistic", "intuitionistic", "fuzzy", "Hamacher", "ordered", "weighted", "average", "operator", ",", "the", "linguistic", "intuitionistic", "fuzzy", "Hamacher", "ordered", "weighted", "geometric", "mean", "operator", ",", "the", "linguistic", "intuitionistic", "fuzzy", "Hamacher", "hybrid", "weighted", "average", "operator", ",", "and", "the", "linguistic", "intuitionistic", "fuzzy", "Hamacher", "hybrid", "weighted", "geometric", "mean", "operator", "."], ["Then", ",", "several", "of", "their", "desirable", "properties", "are", "researched", "to", "guarantee", "the", "rationality", "."], ["Methods", "for", "determining", "the", "weights", "of", "criteria", ",", "decision", "makers", "as", "well", "as", "the", "ordered", "positions", "are", "offered", ",", "respectively", "."], ["After", "that", ",", "a", "procedure", "for", "group", "decision", "making", "with", "linguistic", "intuitionistic", "fuzzy", "information", "is", "provided", "."], ["Finally", ",", "a", "group", "decision-making", "problem", "is", "offered", "to", "illustrate", "the", "application", "of", "the", "new", "results", "."]], "ner": [[], [], [], [], [], [], [], []], "relations": [[], [], [], [], [], [], [], []], "doc_key": "2796607341"}

{"clusters": [[], [], [], []], "sentences": [["Dose-response", "analysis", "can", "be", "carried", "out", "using", "multi-purpose", "commercial", "statistical", "software", ",", "but", "except", "for", "a", "few", "special", "cases", "the", "analysis", "easily", "becomes", "cumbersome", "as", "relevant", ",", "non-standard", "output", "requires", "manual", "programming", "."], ["The", "extension", "package", "drc", "for", "the", "statistical", "environment", "R", "provides", "a", "flexible", "and", "versatile", "infrastructure", "for", "dose-response", "analyses", "in", "general", "."], ["The", "present", "version", "of", "the", "package", ",", "reflecting", "extensions", "and", "modifications", "over", "the", "last", "decade", ",", "provides", "a", "user-friendly", "interface", "to", "specify", "the", "model", "assumptions", "about", "the", "dose-response", "relationship", "and", "comes", "with", "a", "number", "of", "extractors", "for", "summarizing", "fitted", "models", "and", "carrying", "out", "inference", "on", "derived", "parameters", "."], ["The", "aim", "of", "the", "present", "paper", "is", "to", "provide", "an", "overview", "of", "state-of-the-art", "dose-response", "analysis", ",", "both", "in", "terms", "of", "general", "concepts", "that", "have", "evolved", "and", "matured", "over", "the", "years", "and", "by", "means", "of", "concrete", "examples", "."]], "ner": [[], [], [], []], "relations": [[], [], [], []], "doc_key": "2212640525"} `

dwadden commented 2 years ago

Hi, I'll take a look over the weekend.

dwadden commented 2 years ago

I think it might be because the input instances don't have a dataset field. See here: https://github.com/dwadden/dygiepp#working-with-new-datasets.

danilo-dessi commented 2 years ago

That was it! In the previous version it was not necessary and I was not able to run it again. Thank you!

dwadden commented 2 years ago

Great, glad it worked!

dwadden / dygiepp

Issue on predicting on new data #91