charlesashby / entity-sentiment-analysis

Various ops for handling several entities in a document, perform anaphora resolution, clustering, etc.
12 stars 2 forks source link

TypeError on coreferences = output['corefs'] #1

Open diegoje opened 6 years ago

diegoje commented 6 years ago

Hey Charles!

I tried this script and I get the following error:

Traceback (most recent call last):
  File "/private/tmp/DeepLearning/parse_doc.py", line 175, in <module>
    get_sentiment(text, network)
  File "/private/tmp/DeepLearning/parse_doc.py", line 154, in get_sentiment
    contexts = parse_doc(document)
  File "/private/tmp/DeepLearning/parse_doc.py", line 127, in parse_doc
    tree = parse_sentence(coreference_resolution(sentence))
  File "/private/tmp/DeepLearning/parse_doc.py", line 57, in coreference_resolution
    coreferences = output['corefs']
TypeError: string indices must be integers

Process finished with exit code 1

I have pycorenlp==0.3.0 (don't know if that's the one you used) and running the same StanfordNLP as you (http://nlp.stanford.edu/software/stanford-corenlp-full-2016-10-31.zip) on Java 9.0.1

And here's what I get from StanfordNLP:

[pool-1-thread-1] INFO CoreNLP - [/0:0:0:0:0:0:0:1:50375] API call w/annotators tokenize,ssplit,pos,lemma,ner,depparse,mention,coref
Jean is really sad, but Adam is the happiest guy ever
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
[pool-1-thread-1] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [2.0 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.8 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [1.3 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
edu.stanford.nlp.util.ReflectionLoading$ReflectionLoadingException: Error creating edu.stanford.nlp.time.TimeExpressionExtractorImpl
        at edu.stanford.nlp.util.ReflectionLoading.loadByReflection(ReflectionLoading.java:40)
        at edu.stanford.nlp.time.TimeExpressionExtractorFactory.create(TimeExpressionExtractorFactory.java:57)
        at edu.stanford.nlp.time.TimeExpressionExtractorFactory.createExtractor(TimeExpressionExtractorFactory.java:38)
        at edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.<init>(NumberSequenceClassifier.java:86)
        at edu.stanford.nlp.ie.NERClassifierCombiner.<init>(NERClassifierCombiner.java:136)
        at edu.stanford.nlp.pipeline.AnnotatorImplementations.ner(AnnotatorImplementations.java:121)
        at edu.stanford.nlp.pipeline.AnnotatorFactories$6.create(AnnotatorFactories.java:273)
        at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:152)

...and a bunch more lines

Is it a problem with JollyDayHoliday? What do you think I could change to make it work?

Disclaimer: sorry but I don't have any experience with StanfordNLP

charlesashby commented 6 years ago

Hey Diego! I tested the script and it works fine for me, I added a requirements.txt file maybe try checking that out!

diegoje commented 6 years ago

Thanks for this, Charles.

I did some research and apparently I'm not the only one with this issue.

It looks like it is due to the Java version. I'm using Java 9.0.1 - which one are you using?

UPDATE: issue fixed when using Java 1.8!

diegoje commented 6 years ago

Hi Charles,

After the above fix, I ran into an IndexError: list index out of range error which drove me crazy. After debugging, the following change made it work for me.

From predictions[0][i][0] to predictions[0][0][i][0] - adding a layer for it to find the sentiment solved it.

Here's how my get_sentiment() now looks like:

def get_sentiment(document, network):
    """ Create a dict of every entities with their associated sentiment """
    print('Parsing Document...')
    contexts = parse_doc(document)
    print('Done!')
    entities = init_dict(contexts)
    sentences = [sentence.encode('utf-8') for _, sentence in contexts]
    predictions = network.predict_sentences(sentences)

    for i, c in enumerate(contexts):
        key = c[0]

        # =======
        # Note: had to change from predictions[0][i][0] to predictions[0][0][i][0]
        # to avoid IndexError: list index out of range
        # =======

        if entities[key] != None:
            entities[key] += (predictions[0][0][i][0] - predictions[0][0][i][1])
            entities[key] /= 2
        else:
            entities[key] = (predictions[0][0][i][0] - predictions[0][0][i][1])

    for e in entities.keys():
        print('Entity: %s -- sentiment: %s' % (e, entities[e]))

And this is what predictions prints:

([array([[0.9401707 , 0.05982935],
       [0.12330811, 0.8766919 ],
       [0.05135493, 0.948645  ],
       [0.04372008, 0.95627993],
       [0.76382387, 0.23617609],
       [0.07036395, 0.92963606],
       [0.33202317, 0.66797686],
       [0.43130934, 0.56869066],
       [0.5940844 , 0.40591562],
       [0.02576003, 0.97424   ],
       [0.79468673, 0.20531328],
       [0.6695906 , 0.33040938],
       [0.8919715 , 0.10802842],
       [0.21454915, 0.7854509 ],
       [0.9765816 , 0.02341843],
       [0.85761005, 0.14238995],
       [0.97303224, 0.02696779],
       [0.7652719 , 0.23472811],
       [0.09908623, 0.9009138 ],
       [0.71454215, 0.2854578 ],
       [0.18902989, 0.81097007],
       [0.74710584, 0.2528942 ],
       [0.88460344, 0.11539648],
       [0.878756  , 0.12124399],
       [0.53546566, 0.46453434],
       [0.9586443 , 0.04135578],
       [0.7557437 , 0.24425633],
       [0.92181313, 0.07818691],
       [0.96484137, 0.03515861],
       [0.5171254 , 0.4828745 ],
       [0.5204062 , 0.47959384],
       [0.8298136 , 0.17018642],
       [0.7914332 , 0.20856674],
       [0.6923996 , 0.30760032],
       [0.04343254, 0.95656747],
       [0.02746156, 0.9725385 ],
       [0.85882574, 0.14117423],
       [0.92434436, 0.07565565],
       [0.14022514, 0.8597748 ],
       [0.03307388, 0.9669261 ],
       [0.9617045 , 0.03829547],
       [0.7466851 , 0.25331494],
       [0.70446235, 0.29553762],
       [0.92154443, 0.07845555],
       [0.8455439 , 0.15445615],
       [0.09782176, 0.90217817],
       [0.84305984, 0.15694015],
       [0.9081956 , 0.09180436],
       [0.4904544 , 0.5095456 ],
       [0.7562372 , 0.24376276],
       [0.74507135, 0.25492868],
       [0.32424155, 0.6757585 ],
       [0.71232206, 0.28767797],
       [0.9237479 , 0.07625207],
       [0.70901144, 0.2909886 ],
       [0.39673564, 0.60326433],
       [0.8038566 , 0.19614334],
       [0.05801026, 0.9419897 ],
       [0.26367864, 0.73632133],
       [0.36928973, 0.63071024],
       [0.9059966 , 0.09400345],
       [0.88516587, 0.11483412],
       [0.05036072, 0.94963926],
       [0.85224956, 0.14775042]], dtype=float32)], [['0,Bitcoin is looking very promising .', 'pos', 0.9401707], ['0,John is not going too well', 'neg', 0.8766919], ['0,Litecoin is crashing', 'neg', 0.948645], ['0,Adam is Sad .', 'neg', 0.95627993]])

My results:

text = "Bitcoin is looking very promising. John is not going too well, and Litecoin is crashing. Adam is Sad."

Entity:  Bitcoin -- sentiment: 0.88034135
Entity:  Litecoin -- sentiment: -0.89729005
Entity:  Adam -- sentiment: -0.91255987
Entity:  John -- sentiment: -0.75338376

Is this correct, or did I do something wrong?

Thanks!

charlesashby commented 6 years ago

Hey Diego! Yeah, the output looks weird this has to do with the predict_sentences method in CharLSTM. Basically, the network concatenate a bunch of sentences together to have a batch size equal to your BATCH_SIZE then it outputs the sentiment to all these sentences.. I was too lazy to fix this haha..

Also, here's my java version:

$ java -version

openjdk version "1.8.0_161"
OpenJDK Runtime Environment (build 1.8.0_161-b14)
OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode)