dair-iitd / OpenIE-standalone

Other
566 stars 73 forks source link

Seemingly Normal Phrases causing Errors #23

Open guilherme-salome opened 6 years ago

guilherme-salome commented 6 years ago

I am having issues with OpenIE reporting errors on a few phrases, some of which are regular phrases without weird punctuation. For example: "Mozambique steps up oil and gas search." leads to an Error on sentence (traceback below).

I am running the pre-compiled binaries with the options:

java -Xmx10g -XX:+UseConcMarkSweepGC -jar openie-assembly.jar --ignore-errors --binary "input.txt" "output.txt"

Here are some of the examples that have caused errors:

Error on sentence: U.S. Plains cattle markets quiet, no bids or sales.                 
Error on sentence: Indexes dip for the day, but surge in 2006.                         
Error on sentence: Factory data and Fed minutes on tap.                                
Error on sentence: Factory data and Fed minutes on tap.                                
Error on sentence: Spreads watch-Major European and U.S. M&A deals.                    
Error on sentence: LME copper ticks up, but inventories worry.                         
Error on sentence: Home Depot chairman and CEO resigns.                                
Error on sentence: CBOT wheat, corn and soy fall on fund liquidation.                  
Error on sentence: CBOT wheat slides 3 pct early, corn and soy follow.                 
Error on sentence: NY coffee and sugar sink, cocoa surges midsession.                  
Error on sentence: Ford seen still in crisis but pessimism wanes.                      
Error on sentence: NYBOT says president and chief executive dies.                      
Error on sentence: Amazon launches shoe and handbag Web site.                          
Error on sentence: Americas Mining and Metals to April 2008.                           
Error on sentence: Aon sets up threat management and security unit.                    
Error on sentence: Taiwan to tender for US or Brazil soy on Friday.                    
Error on sentence: Spreads watch-Major European and U.S. M&A deals.                    
Error on sentence: CBOT wheat, corn and soybeans close lower.                          
Error on sentence: Rabobank and Westpac win Reuters December forex poll.               
Error on sentence: Rabobank and Westpac win Reuters December forex poll.               
Error on sentence: US exporters sell corn to Mexico and Egypt - USDA.                  
Error on sentence: US exporters sell corn to Mexico and Egypt - USDA.                  
Error on sentence: CBOT corn dips 2 pct as funds sell and crude falls.                 
Error on sentence: CBOT wheat, corn and soy falling to day's lows.                     
Error on sentence: Jobs report and Fed chief top the bill.                             
Error on sentence: CME cattle end down but pare loss late on corn dip.                 
Error on sentence: Americas Mining and Metals to April 2008.                           
Error on sentence: Jobs report and Fed chief top the bill.                             
Error on sentence: Power confirms Putnam talks but says no deal yet.                   
Error on sentence: Power confirms Putnam talks but says no deal yet.                   
Error on sentence: Aussie slides vs USD and yen; bonds extend gains.                   
Error on sentence: French pay-TV Canal Plus seals TPS merger.                          
Error on sentence: Taiwan BSPA passes on tender for US or Brazil soy.                  
Error on sentence: Spreads watch-Major European and U.S. M&A deals.                    
Error on sentence: Make-or-break news awaited on new Astra heart drug.                 
Error on sentence: Make-or-break news awaited on new Astra heart drug.                 
Error on sentence: CME cattle and hogs end mixed, spreading active.                    
Error on sentence: US mortgage bonds may thrive in '07 but risks abound.               
Error on sentence: Mozambique steps up oil and gas search.                             
Error on sentence: U.S. oils and fats - Jan 5.                                         
Error on sentence: Americas Mining and Metals to April 2008.                           
Error on sentence: Audi sees record revenue and earnings in 2006.                      
Error on sentence: HK dollar falls on arbitrage and strong US data.                    
Error on sentence: Investors flee copper and oil, gold firm.                           
Error on sentence: CBOT wheat, corn and soybeans close lower.                          
Error on sentence: Americas Mining and Metals to October 2008.                         
Error on sentence: CIF Gulf Grain-Corn and soy steady, farmer selling slow.            
Error on sentence: Midwest Cash Grain PM - Corn and soy basis mixed; movement slow.    
Error on sentence: U.S. oils and fats - Jan 8.                                         
Error on sentence: Reuters Summit-U.S. FDA head sees change but no slowdown.           
Error on sentence: Amgen CEO says 2006 "great", but some risks remain.                 
Error on sentence: CME cattle end up on weather concern, but cut gain.                 
Error on sentence: Reuters Summit-US FDA head sees changes but no slowdown.            
Error on sentence: IMM specs increase bets against yen and loonie-CFTC.                
Error on sentence: IMM specs increase bets against yen and loonie-CFTC.                
Error on sentence: RadioShack improves, but same-store sales weigh.                    
Error on sentence: Hedge funds gain in '06 but lag broader market.                     
Error on sentence: Aussie dollar rises vs USD and yen on rate risk.                    
Error on sentence: BP's Q4 oil and gas output falls 5 pct, shares hit.                 
Error on sentence: Vietnam Dec auto sales +21 pct, but dip 1 pct in 2006.              
Error on sentence: Vietnam Dec auto sales +21 pct, but dip 1 pct in 2006.              

The errors all seem to be caused by the same issue:

java.lang.NoSuchMethodError: java.lang.String.join(Ljava/lang/CharSequence;Ljava/lang/Iterable;)Ljava/lang/String;
    at edu.iitd.cse.openieListExtractor.helper.ListExtractorLMHelpers.getNGramsOfSentence(ListExtractorLMHelpers.java:38)
    at edu.iitd.cse.openieListExtractor.extractors.ListExtractorLanguageModelBasedExtractor.getListOfNGramsOfAllConjuncts(ListExtractorLanguageModelBasedExtractor.java:436)
    at edu.iitd.cse.openieListExtractor.extractors.ListExtractorLanguageModelBasedExtractor.adjustFirstConjunctOfOneStructure(ListExtractorLanguageModelBasedExtractor.java:104)
    at edu.iitd.cse.openieListExtractor.extractors.ListExtractorLanguageModelBasedExtractor.adjustFirstConjunctOfAllStructures(ListExtractorLanguageModelBasedExtractor.java:293)
    at edu.iitd.cse.openieListExtractor.extractors.ListExtractorLanguageModelBasedExtractor.adjustConjunctsByLM(ListExtractorLanguageModelBasedExtractor.java:54)
    at edu.iitd.cse.openieListExtractor.extractors.ListExtractorLanguageModelBasedExtractor.fixConjunctStructures(ListExtractorLanguageModelBasedExtractor.java:39)
    at edu.iitd.cse.openieListExtractor.helper.ListExtractorMainHelpers.helperMainSentences(ListExtractorMainHelpers.java:49)
    at edu.knowitall.openie.OpenIE.extract(OpenIE.scala:159)
    at edu.knowitall.openie.OpenIE.extract(OpenIE.scala:61)
    at edu.knowitall.openie.OpenIECli$$anonfun$run$1$$anonfun$apply$mcV$sp$2$$anonfun$apply$4$$anonfun$apply$6.apply(OpenIECli.scala:227)
    at edu.knowitall.openie.OpenIECli$$anonfun$run$1$$anonfun$apply$mcV$sp$2$$anonfun$apply$4$$anonfun$apply$6.apply(OpenIECli.scala:222)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at edu.knowitall.openie.OpenIECli$$anonfun$run$1$$anonfun$apply$mcV$sp$2$$anonfun$apply$4.apply(OpenIECli.scala:222)
    at edu.knowitall.openie.OpenIECli$$anonfun$run$1$$anonfun$apply$mcV$sp$2$$anonfun$apply$4.apply(OpenIECli.scala:214)
    at resource.AbstractManagedResource$$anonfun$5.apply(AbstractManagedResource.scala:86)
    at scala.util.control.Exception$Catch$$anonfun$either$1.apply(Exception.scala:124)
    at scala.util.control.Exception$Catch$$anonfun$either$1.apply(Exception.scala:124)
    at scala.util.control.Exception$Catch.apply(Exception.scala:102)
    at scala.util.control.Exception$Catch.either(Exception.scala:124)
    at resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:86)
    at resource.ManagedResourceOperations$class.acquireAndGet(ManagedResourceOperations.scala:25)
    at resource.AbstractManagedResource.acquireAndGet(AbstractManagedResource.scala:48)
    at resource.ManagedResourceOperations$class.foreach(ManagedResourceOperations.scala:45)
    at resource.AbstractManagedResource.foreach(AbstractManagedResource.scala:48)
    at edu.knowitall.openie.OpenIECli$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(OpenIECli.scala:214)
    at edu.knowitall.openie.OpenIECli$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(OpenIECli.scala:213)
    at resource.AbstractManagedResource$$anonfun$5.apply(AbstractManagedResource.scala:86)
    at scala.util.control.Exception$Catch$$anonfun$either$1.apply(Exception.scala:124)
    at scala.util.control.Exception$Catch$$anonfun$either$1.apply(Exception.scala:124)
    at scala.util.control.Exception$Catch.apply(Exception.scala:102)
    at scala.util.control.Exception$Catch.either(Exception.scala:124)
    at resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:86)
    at resource.ManagedResourceOperations$class.acquireAndGet(ManagedResourceOperations.scala:25)
    at resource.AbstractManagedResource.acquireAndGet(AbstractManagedResource.scala:48)
    at resource.ManagedResourceOperations$class.foreach(ManagedResourceOperations.scala:45)
    at resource.AbstractManagedResource.foreach(AbstractManagedResource.scala:48)
    at edu.knowitall.openie.OpenIECli$$anonfun$run$1.apply$mcV$sp(OpenIECli.scala:213)
    at edu.knowitall.openie.OpenIECli$$anonfun$run$1.apply(OpenIECli.scala:213)
    at edu.knowitall.openie.OpenIECli$$anonfun$run$1.apply(OpenIECli.scala:213)
    at edu.knowitall.common.Timing$.time(Timing.scala:50)
    at edu.knowitall.common.Timing$.timeThen(Timing.scala:72)
    at edu.knowitall.openie.OpenIECli$.run(OpenIECli.scala:241)
    at edu.knowitall.openie.OpenIECli$delayedInit$body.apply(OpenIECli.scala:176)
    at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
    at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
    at scala.App$$anonfun$main$1.apply(App.scala:71)
    at scala.App$$anonfun$main$1.apply(App.scala:71)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
    at scala.App$class.main(App.scala:71)
    at edu.knowitall.openie.OpenIECli$.main(OpenIECli.scala:30)
    at edu.knowitall.openie.OpenIECli.main(OpenIECli.scala)
swarnaHub commented 6 years ago

image They seem to be working fine for me. Here are the extractions from the first sentence you reported. Check if you have the latest code pulled from the master branch and that the language model is correctly downloaded.

swarnaHub commented 6 years ago

image Another one..

guilherme-salome commented 6 years ago

Hi @swarnaHub , thanks for double checking. I literally downloaded everything yesterday. I cloned the repo, got the pre-compiled binaries, created the lib folder and populated with the downloads from the build section, and downloaded the data model from the link and stored in the data folder. I did not try recompiling it because I'm running everything on linux and it seemed to work fine. Except for these weird cases that do not work.

guilherme-salome commented 6 years ago

I will try parsing the phrases with errors on my mac after the process finishes running.

guilherme-salome commented 6 years ago

I just realized it could be an issue with the java version.

guilherme-salome commented 6 years ago

I tried again starting from scratch on ubuntu 18.04 in a digitalocean droplet using java 8 (couldn't install 7 for some reason). I still got the same errors.

swarnaHub commented 6 years ago

This is indeed a java problem. Check the below issue - https://stackoverflow.com/questions/23413092/error-under-ubuntu-with-the-method-string-join

guilherme-salome commented 6 years ago

Hmmm interesting. On the droplet I am using when I runn java -version I get:

openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-8u181-b13-0ubuntu0.18.04.1-b13)
OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)

which corresponds to java 8. The requirement for open ie is java 7.

Zoher15 commented 5 years ago

A temporary bypass is just to add double quotes on both sides

Zoher15 commented 5 years ago

image Really tired of this bug. I have been wrapping the text with double quotes to solve it but I cannot understand what is causing this error. My java version: openjdk version "1.8.0_191" OpenJDK Runtime Environment (build 1.8.0_191-b12) OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)