allenai / allennlp

An open-source NLP research library, built on PyTorch.
http://www.allennlp.org
Apache License 2.0
11.77k stars 2.25k forks source link

Any python script examples on the use of constituency parser #1278

Closed csyhuang closed 6 years ago

csyhuang commented 6 years ago

Hi,

I am interested in incorporating the constituency parser in AllenNLP to a product. However, I cannot find any examples that shows how one can use the constituency parser in python script to implement analyses like: http://demo.allennlp.org/machine-comprehension May you create some simple sample scripts, or point me to some existing scripts that you know of?

Thanks in advance,

Clare

DeNeutoy commented 6 years ago

Hi Clare!

Hopefully this snippet helps. You'll need to install allennlp first by following instructions on the readme. Let me know if I can help more.

from allennlp.models.archival import load_archive
from allennlp.service.predictors import Predictor

archive = load_archive(
            "https://s3-us-west-2.amazonaws.com/allennlp/models/elmo-constituency-parser-2018.03.14.tar.gz"
        )
predictor = Predictor.from_archive(archive, 'constituency-parser')

predictor.predict_json({"sentence": "This is a sentence to be predicted!"})
csyhuang commented 6 years ago

Hi Mark,

Thanks for your reply. I installed AllenNLP and tried the snippet you included. The following error occurs:

KeyError                                  Traceback (most recent call last)
<ipython-input-1-db028ebf2091> in <module>()
----> 1 from allennlp.models.archival import load_archive
      2 from allennlp.service.predictors import Predictor
      3 
      4 archive = load_archive(
      5             "https://s3-us-west-2.amazonaws.com/allennlp/models/elmo-constituency-parser-2018.03.14.tar.gz"

~/anaconda/envs/datascience/lib/python3.6/site-packages/allennlp/models/__init__.py in <module>()
      6 from allennlp.models.model import Model
      7 from allennlp.models.archival import archive_model, load_archive, Archive
----> 8 from allennlp.models.biattentive_classification_network import BiattentiveClassificationNetwork
      9 from allennlp.models.constituency_parser import SpanConstituencyParser
     10 from allennlp.models.coreference_resolution.coref import CoreferenceResolver

~/anaconda/envs/datascience/lib/python3.6/site-packages/allennlp/models/biattentive_classification_network.py in <module>()
     14 from allennlp.nn import InitializerApplicator, RegularizerApplicator
     15 from allennlp.nn import util
---> 16 from allennlp.training.metrics import CategoricalAccuracy
     17 
     18 

~/anaconda/envs/datascience/lib/python3.6/site-packages/allennlp/training/__init__.py in <module>()
----> 1 from allennlp.training.trainer import Trainer

~/anaconda/envs/datascience/lib/python3.6/site-packages/allennlp/training/trainer.py in <module>()
     21 from torch.nn.parallel import replicate, parallel_apply
     22 from torch.nn.parallel.scatter_gather import scatter_kwargs, gather
---> 23 from tensorboardX import SummaryWriter
     24 
     25 

~/anaconda/envs/datascience/lib/python3.6/site-packages/tensorboardX/__init__.py in <module>()
      2 """
      3 
----> 4 from .writer import FileWriter, SummaryWriter
      5 from .record_writer import RecordWriter

~/anaconda/envs/datascience/lib/python3.6/site-packages/tensorboardX/writer.py in <module>()
     22 import json
     23 import os
---> 24 from .src import event_pb2
     25 from .src import summary_pb2
     26 from .src import graph_pb2

~/anaconda/envs/datascience/lib/python3.6/site-packages/tensorboardX/src/event_pb2.py in <module>()
      8 from google.protobuf import reflection as _reflection
      9 from google.protobuf import symbol_database as _symbol_database
---> 10 from google.protobuf import descriptor_pb2
     11 # @@protoc_insertion_point(imports)
     12 

~/anaconda/envs/datascience/lib/python3.6/site-packages/google/protobuf/descriptor_pb2.py in <module>()
    407       message_type=None, enum_type=None, containing_type=None,
    408       is_extension=False, extension_scope=None,
--> 409       options=None),
    410   ],
    411   extensions=[

~/anaconda/envs/datascience/lib/python3.6/site-packages/google/protobuf/descriptor.py in __new__(cls, name, full_name, index, number, type, cpp_type, label, default_value, message_type, enum_type, containing_type, is_extension, extension_scope, options, has_default_value, containing_oneof, json_name, file)
    499         return _message.default_pool.FindExtensionByName(full_name)
    500       else:
--> 501         return _message.default_pool.FindFieldByName(full_name)
    502 
    503   def __init__(self, name, full_name, index, number, type, cpp_type, label,

KeyError: "Couldn't find field google.protobuf.DescriptorProto.ExtensionRange.options"

Do you have a clue what this is about? Is it related to the version of protobuf I have?

Thanks in advance,

Clare

nelson-liu commented 6 years ago

hmm, how did you install AllenNLP? Was the environment you installed in a fresh one, or did you use an existing one?

csyhuang commented 6 years ago

Hi Nelson and Mark,

I installed in an existing environment since my product also requires some other packages. It happens that the error seems gone after I install the most updated version of protobuf (version 3.5.2; what I had was 3.3.2)

The output from the snippet Mark gave above looks like:

 'tokens': ['This', 'is', 'a', 'sentence', 'to', 'be', 'predicted', '!'],
 'trees': '(S (NP (DT This)) (VP (VBZ is) (NP (NP (DT a) (NN sentence)) (SBAR (S (VP (TO to) (VP (VB be) (VP (VBN predicted)))))))) (. !))'}

It seems working fine?

Thanks for helping!

Clare

nelson-liu commented 6 years ago

Great, glad to hear you got it working. Closing this issue, feel free to reopen if you have further questions.

SwatiTiwarii commented 5 years ago

Hi @csyhuang , can you provide some help in how you split the different nodes present in the output. We are having output in text format, can we convert it to nltk.tree format so that i can access various subtrees of the output.

csyhuang commented 5 years ago

@SwatiTiwarii You can look at this: https://www.nltk.org/_modules/nltk/tree.html

I guess what you want is Tree.fromstring there.

(Sorry that I missed the email notification about your message. I only see this now...)

hafiz031 commented 2 years ago

Hi Clare!

Hopefully this snippet helps. You'll need to install allennlp first by following instructions on the readme. Let me know if I can help more.

from allennlp.models.archival import load_archive
from allennlp.service.predictors import Predictor

archive = load_archive(
            "https://s3-us-west-2.amazonaws.com/allennlp/models/elmo-constituency-parser-2018.03.14.tar.gz"
        )
predictor = Predictor.from_archive(archive, 'constituency-parser')

predictor.predict_json({"sentence": "This is a sentence to be predicted!"})

Getting this:

ConfigurationError: ptb_trees not in acceptable choices for dataset_reader.type: ['babi', 'conll2003', 'interleaving', 'multitask', 'multitask_shim', 'sequence_tagging', 'sharded', 'text_classification_json']. You should either use the --include-package flag to make sure the correct module is loaded, or use a fully qualified class name in your config file like {"model": "my_module.models.MyModel"} to have it imported automatically.