brmson / yodaqa

A Question Answering system built on top of the Apache UIMA framework.
http://ailao.eu/yodaqa
Other
619 stars 205 forks source link

questionDump is extremely slow #42

Closed iampkuhz closed 8 years ago

iampkuhz commented 8 years ago

hi, I'm using yodaqa for question classify now. however, the question to json feature process is extremely slow on my server. the progress is like this

./gradlew questionDump -PexecArgs="my-question.tsv my-question-corrupt.json"

only 138 questions processed in 2 hours 20 min.

How to speed up this feature generation progress? by the way, i have set the dbpedia fuseki-server with 20G memory and gradlew with 15G memory

pasky commented 8 years ago

Hi! It seems you use some custom, locally set up endpoints. What all have you customized? What if you try it with the default ailao endpoints? You may also try to enable verbose logging to try to guess in what stages it gets stuck.

If you are accessing our endpoints from China, that might be a part of the issue as the question analysis phase is not parallelized, so latency matters. But it sure shouldn't stretch the time from ~1s to ~1m!

iampkuhz commented 8 years ago

glad to see this reply!

  1. i'm new to this field. what does endpoints here mean?
  2. you mean change the dbpedia serve from local to http://dbpedia.ailao.eu:3030/dbpedia/query ?
  3. how to enable verbose logging?

here is my log in /gradlew

nohup: ignoring input
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/sfh/.gradle/caches/modules-2/files-2.1/org.slf4j/slf4j-simple/1.7.7/8095d0b9f7e0a9cd79a663c740e0f8fb31d0e2c8/slf4j-simple-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/sfh/.gradle/caches/modules-2/files-2.1/org.slf4j/slf4j-log4j12/1.7.7/58f588119ffd1702c77ccab6acb54bfb41bed8bd/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.SimpleLoggerFactory]
INFO ResourceObjectProviderBase - :: loading settings :: url = jar:file:/home/sfh/.gradle/caches/modules-2/files-2.1/org.apache.ivy/ivy/2.3.0/c5ebf1c253ad4959a29f4acfe696ee48cdd9f473/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
INFO ResourceObjectProviderBase - :: loading settings :: url = jar:file:/home/sfh/.gradle/caches/modules-2/files-2.1/org.apache.ivy/ivy/2.3.0/c5ebf1c253ad4959a29f4acfe696ee48cdd9f473/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
INFO ResourceObjectProviderBase - Producing resource from [jar:file:/home/sfh/.ivy2/cache/de.tudarmstadt.ukp.dkpro.core/de.tudarmstadt.ukp.dkpro.core.opennlp-upstream-sentence-en-maxent/jars/de.tudarmstadt.ukp.dkpro.core.opennlp-upstream-sentence-en-maxent-20120616.jar!/de/tudarmstadt/ukp/dkpro/core/opennlp/lib/sentence-en-maxent.bin] redirected from [jar:file:/home/sfh/.ivy2/cache/de.tudarmstadt.ukp.dkpro.core/de.tudarmstadt.ukp.dkpro.core.opennlp-model-sentence-en-maxent/jars/de.tudarmstadt.ukp.dkpro.core.opennlp-model-sentence-en-maxent-20120616.1.jar!/de/tudarmstadt/ukp/dkpro/core/opennlp/lib/sentence-en-maxent.properties]
INFO ResourceObjectProviderBase - Producing resource took 50ms
INFO ResourceObjectProviderBase - :: loading settings :: url = jar:file:/home/sfh/.gradle/caches/modules-2/files-2.1/org.apache.ivy/ivy/2.3.0/c5ebf1c253ad4959a29f4acfe696ee48cdd9f473/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
INFO ResourceObjectProviderBase - Producing resource from [jar:file:/home/sfh/.ivy2/cache/de.tudarmstadt.ukp.dkpro.core/de.tudarmstadt.ukp.dkpro.core.opennlp-upstream-token-en-maxent/jars/de.tudarmstadt.ukp.dkpro.core.opennlp-upstream-token-en-maxent-20120616.jar!/de/tudarmstadt/ukp/dkpro/core/opennlp/lib/token-en-maxent.bin] redirected from [jar:file:/home/sfh/.ivy2/cache/de.tudarmstadt.ukp.dkpro.core/de.tudarmstadt.ukp.dkpro.core.opennlp-model-token-en-maxent/jars/de.tudarmstadt.ukp.dkpro.core.opennlp-model-token-en-maxent-20120616.1.jar!/de/tudarmstadt/ukp/dkpro/core/opennlp/lib/token-en-maxent.properties]
INFO ResourceObjectProviderBase - Producing resource took 86ms
INFO ResourceObjectProviderBase - :: loading settings :: url = jar:file:/home/sfh/.gradle/caches/modules-2/files-2.1/org.apache.ivy/ivy/2.3.0/c5ebf1c253ad4959a29f4acfe696ee48cdd9f473/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
INFO ResourceObjectProviderBase - Producing resource from [jar:file:/home/sfh/.ivy2/cache/de.tudarmstadt.ukp.dkpro.core/de.tudarmstadt.ukp.dkpro.core.stanfordnlp-upstream-parser-en-rnn/jars/de.tudarmstadt.ukp.dkpro.core.stanfordnlp-upstream-parser-en-rnn-20140104.jar!/de/tudarmstadt/ukp/dkpro/core/stanfordnlp/lib/parser-en-rnn.ser.gz] redirected from [jar:file:/home/sfh/.ivy2/cache/de.tudarmstadt.ukp.dkpro.core/de.tudarmstadt.ukp.dkpro.core.stanfordnlp-model-parser-en-rnn/jars/de.tudarmstadt.ukp.dkpro.core.stanfordnlp-model-parser-en-rnn-20140104.1.jar!/de/tudarmstadt/ukp/dkpro/core/stanfordnlp/lib/parser-en-rnn.properties]
Apr 04, 2016 7:05:24 PM de.tudarmstadt.ukp.dkpro.core.stanfordnlp.StanfordParser$StanfordParserModelProvider produceResource(511)
INFO: Loading parser from serialized file jar:file:/home/sfh/.ivy2/cache/de.tudarmstadt.ukp.dkpro.core/de.tudarmstadt.ukp.dkpro.core.stanfordnlp-upstream-parser-en-rnn/jars/de.tudarmstadt.ukp.dkpro.core.stanfordnlp-upstream-parser-en-rnn-20140104.jar!/de/tudarmstadt/ukp/dkpro/core/stanfordnlp/lib/parser-en-rnn.ser.gz ...
INFO ResourceObjectProviderBase - Producing resource took 1699ms
INFO ResourceObjectProviderBase - Producing resource from jar:file:/home/sfh/.gradle/caches/modules-2/files-2.1/de.tudarmstadt.ukp.dkpro.core/de.tudarmstadt.ukp.dkpro.core.api.lexmorph-asl/1.7.0/660e6a99e1d68595970d004988310949685ff957/de.tudarmstadt.ukp.dkpro.core.api.lexmorph-asl-1.7.0.jar!/de/tudarmstadt/ukp/dkpro/core/api/lexmorph/tagset/en-ptb-pos.map
INFO ResourceObjectProviderBase - Producing resource took 0ms
INFO ResourceObjectProviderBase - Producing resource from jar:file:/home/sfh/.gradle/caches/modules-2/files-2.1/de.tudarmstadt.ukp.dkpro.core/de.tudarmstadt.ukp.dkpro.core.api.syntax-asl/1.7.0/5c91e9a53843bf951188e6f3c4fe67986bbf5a6e/de.tudarmstadt.ukp.dkpro.core.api.syntax-asl-1.7.0.jar!/de/tudarmstadt/ukp/dkpro/core/api/syntax/tagset/en-ptb-constituency.map
INFO ResourceObjectProviderBase - Producing resource took 0ms
INFO ResourceObjectProviderBase - :: loading settings :: url = jar:file:/home/sfh/.gradle/caches/modules-2/files-2.1/org.apache.ivy/ivy/2.3.0/c5ebf1c253ad4959a29f4acfe696ee48cdd9f473/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
INFO ResourceObjectProviderBase - Producing resource from jar:file:/home/sfh/.ivy2/cache/de.tudarmstadt.ukp.dkpro.core/de.tudarmstadt.ukp.dkpro.core.opennlp-upstream-ner-en-date/jars/de.tudarmstadt.ukp.dkpro.core.opennlp-upstream-ner-en-date-20100907.jar!/de/tudarmstadt/ukp/dkpro/core/opennlp/lib/ner-en-date.bin
INFO ResourceObjectProviderBase - Producing resource took 563ms
INFO ResourceObjectProviderBase - :: loading settings :: url = jar:file:/home/sfh/.gradle/caches/modules-2/files-2.1/org.apache.ivy/ivy/2.3.0/c5ebf1c253ad4959a29f4acfe696ee48cdd9f473/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
INFO ResourceObjectProviderBase - Producing resource from jar:file:/home/sfh/.ivy2/cache/de.tudarmstadt.ukp.dkpro.core/de.tudarmstadt.ukp.dkpro.core.opennlp-upstream-ner-en-location/jars/de.tudarmstadt.ukp.dkpro.core.opennlp-upstream-ner-en-location-20100907.jar!/de/tudarmstadt/ukp/dkpro/core/opennlp/lib/ner-en-location.bin
INFO ResourceObjectProviderBase - Producing resource took 672ms
INFO ResourceObjectProviderBase - :: loading settings :: url = jar:file:/home/sfh/.gradle/caches/modules-2/files-2.1/org.apache.ivy/ivy/2.3.0/c5ebf1c253ad4959a29f4acfe696ee48cdd9f473/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
INFO ResourceObjectProviderBase - Producing resource from jar:file:/home/sfh/.ivy2/cache/de.tudarmstadt.ukp.dkpro.core/de.tudarmstadt.ukp.dkpro.core.opennlp-upstream-ner-en-money/jars/de.tudarmstadt.ukp.dkpro.core.opennlp-upstream-ner-en-money-20100907.jar!/de/tudarmstadt/ukp/dkpro/core/opennlp/lib/ner-en-money.bin
INFO ResourceObjectProviderBase - Producing resource took 521ms
INFO ResourceObjectProviderBase - :: loading settings :: url = jar:file:/home/sfh/.gradle/caches/modules-2/files-2.1/org.apache.ivy/ivy/2.3.0/c5ebf1c253ad4959a29f4acfe696ee48cdd9f473/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
INFO ResourceObjectProviderBase - Producing resource from jar:file:/home/sfh/.ivy2/cache/de.tudarmstadt.ukp.dkpro.core/de.tudarmstadt.ukp.dkpro.core.opennlp-upstream-ner-en-organization/jars/de.tudarmstadt.ukp.dkpro.core.opennlp-upstream-ner-en-organization-20100907.jar!/de/tudarmstadt/ukp/dkpro/core/opennlp/lib/ner-en-organization.bin
INFO ResourceObjectProviderBase - Producing resource took 537ms
INFO ResourceObjectProviderBase - :: loading settings :: url = jar:file:/home/sfh/.gradle/caches/modules-2/files-2.1/org.apache.ivy/ivy/2.3.0/c5ebf1c253ad4959a29f4acfe696ee48cdd9f473/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
INFO ResourceObjectProviderBase - Producing resource from jar:file:/home/sfh/.ivy2/cache/de.tudarmstadt.ukp.dkpro.core/de.tudarmstadt.ukp.dkpro.core.opennlp-upstream-ner-en-percentage/jars/de.tudarmstadt.ukp.dkpro.core.opennlp-upstream-ner-en-percentage-20100907.jar!/de/tudarmstadt/ukp/dkpro/core/opennlp/lib/ner-en-percentage.bin
INFO ResourceObjectProviderBase - Producing resource took 527ms
INFO ResourceObjectProviderBase - :: loading settings :: url = jar:file:/home/sfh/.gradle/caches/modules-2/files-2.1/org.apache.ivy/ivy/2.3.0/c5ebf1c253ad4959a29f4acfe696ee48cdd9f473/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
INFO ResourceObjectProviderBase - Producing resource from jar:file:/home/sfh/.ivy2/cache/de.tudarmstadt.ukp.dkpro.core/de.tudarmstadt.ukp.dkpro.core.opennlp-upstream-ner-en-person/jars/de.tudarmstadt.ukp.dkpro.core.opennlp-upstream-ner-en-person-20130624.jar!/de/tudarmstadt/ukp/dkpro/core/opennlp/lib/ner-en-person.bin
INFO ResourceObjectProviderBase - Producing resource took 536ms
INFO ResourceObjectProviderBase - :: loading settings :: url = jar:file:/home/sfh/.gradle/caches/modules-2/files-2.1/org.apache.ivy/ivy/2.3.0/c5ebf1c253ad4959a29f4acfe696ee48cdd9f473/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
INFO ResourceObjectProviderBase - Producing resource from jar:file:/home/sfh/.ivy2/cache/de.tudarmstadt.ukp.dkpro.core/de.tudarmstadt.ukp.dkpro.core.opennlp-upstream-ner-en-time/jars/de.tudarmstadt.ukp.dkpro.core.opennlp-upstream-ner-en-time-20100907.jar!/de/tudarmstadt/ukp/dkpro/core/opennlp/lib/ner-en-time.bin
INFO ResourceObjectProviderBase - Producing resource took 520ms
INFO LATByWordnet - ?! word highest number of POS NN not in Wordnet
INFO LATByWordnet - ?! word iphone 6 plus' screen size of POS NN not in Wordnet
INFO LATByWordnet - ?! word two people of POS NNS not in Wordnet
INFO LATByWordnet - ?! word android version of POS NN not in Wordnet
INFO LATByWordnet - ?! word 5 ks of POS NNS not in Wordnet
INFO LATByWordnet - ?! word total value of POS NN not in Wordnet
INFO LATByWordnet - ?! cannot expand LAT of POS WDT
INFO LATByWordnet - ?! cannot expand LAT of POS WDT
INFO LATByWordnet - ?! word total value of POS NN not in Wordnet
INFO LATByWordnet - ?! word total value of POS NN not in Wordnet
INFO FocusGenerator - ?! No focus in: Which insurance company was hacked in early 2015, affecting up to 80 million customers' accounts?

my local dbpedia log is like this:

21:43:34 INFO  [2028] exec/select
21:43:34 INFO  [2028] 200 OK (32 ms)
21:43:34 INFO  [2029] GET http://localhost:3030/dbpedia/query?query=PREFIX++%3A+++++%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2F%3E%0APREFIX++dbo%3A++%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E%0APREFIX++owl%3A++%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E%0APREFIX++rdf%3A++%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0APREFIX++xsd%3A++%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0APREFIX++skos%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2004%2F02%2Fskos%2Fcore%23%3E%0APREFIX++rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0APREFIX++dbpedia%3A+%3Chttp%3A%2F%2Fdbpedia.org%2F%3E%0APREFIX++dbpedia2%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fproperty%2F%3E%0APREFIX++foaf%3A+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2F%3E%0APREFIX++dc%3A+++%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Felements%2F1.1%2F%3E%0A%0ASELECT++%3FpageID+%3Flabel+%3Fres+%3Fdescription%0AWHERE%0A++%7B+++%7B+BIND%28%3ADeaths_in_2008+AS+%3Fres%29+%7D%0A++++UNION%0A++++++%7B+BIND%28%3ADeaths_in_2008+AS+%3Fredir%29%0A++++++++%3Fredir+dbo%3AwikiPageRedirects+%3Fres%0A++++++%7D%0A++++UNION%0A++++++%7B+BIND%28%3ADeaths_in_2008+AS+%3Fdisamb%29%0A++++++++%3Fdisamb+dbo%3AwikiPageDisambiguates+%3Fres%0A++++++%7D%0A++++UNION%0A++++++%7B+BIND%28%3ADeaths_in_2008+AS+%3Fredir%29%0A++++++++%3Fredir+dbo%3AwikiPageRedirects+%3Fdisamb+.%0A++++++++%3Fdisamb+dbo%3AwikiPageDisambiguates+%3Fres%0A++++++%7D%0A++++OPTIONAL%0A++++++%7B+%3Fres+dbo%3AwikiPageRedirects+%3FredirTarget+%7D%0A++++OPTIONAL%0A++++++%7B+%3Fres+dbo%3AwikiPageDisambiguates+%3FdisambTarget+%7D%0A++++%3Fres+dbo%3AwikiPageID+%3FpageID+.%0A++++%3Fres+rdfs%3Alabel+%3Flabel%0A++++OPTIONAL%0A++++++%7B+%3Fres+rdfs%3Acomment+%3Fdescription%0A++++++++FILTER+%28+lang%28%3Fdescription%29+%3D+%22en%22+%29%0A++++++%7D%0A++++FILTER+%28+%21+bound%28%3FredirTarget%29+%29%0A++++FILTER+%28+%21+bound%28%3FdisambTarget%29+%29%0A++++FILTER+%28+lang%28%3Flabel%29+%3D+%22en%22+%29%0A++%7D%0A
21:43:34 INFO  [2029] Query = PREFIX  :     <http://dbpedia.org/resource/> PREFIX  dbo:  <http://dbpedia.org/ontology/> PREFIX  owl:  <http://www.w3.org/2002/07/owl#> PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX  xsd:  <http://www.w3.org/2001/XMLSchema#> PREFIX  skos: <http://www.w3.org/2004/02/skos/core#> PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX  dbpedia: <http://dbpedia.org/> PREFIX  dbpedia2: <http://dbpedia.org/property/> PREFIX  foaf: <http://xmlns.com/foaf/0.1/> PREFIX  dc:   <http://purl.org/dc/elements/1.1/>  SELECT  ?pageID ?label ?res ?description WHERE   {   { BIND(:Deaths_in_2008 AS ?res) }     UNION       { BIND(:Deaths_in_2008 AS ?redir)         ?redir dbo:wikiPageRedirects ?res       }     UNION       { BIND(:Deaths_in_2008 AS ?disamb)         ?disamb dbo:wikiPageDisambiguates ?res       }     UNION       { BIND(:Deaths_in_2008 AS ?redir)         ?redir dbo:wikiPageRedirects ?disamb .         ?disamb dbo:wikiPageDisambiguates ?res       }     OPTIONAL       { ?res dbo:wikiPageRedirects ?redirTarget }     OPTIONAL       { ?res dbo:wikiPageDisambiguates ?disambTarget }     ?res dbo:wikiPageID ?pageID .     ?res rdfs:label ?label     OPTIONAL       { ?res rdfs:comment ?description         FILTER ( lang(?description) = "en" )       }     FILTER ( ! bound(?redirTarget) )     FILTER ( ! bound(?disambTarget) )     FILTER ( lang(?label) = "en" )   }
21:43:34 INFO  [2029] exec/select
21:43:34 INFO  [2029] 200 OK (29 ms)

any extral hint?

iampkuhz commented 8 years ago

Does questionDump have relation with Solr server? I haven't start Solr yet

pasky commented 8 years ago

Yes, I meant that if you imported local DBpedia, do you use anything else locally too, or did you change any configuration wrt. which resources are used?

To enable verbose logging, README explains:

Alternatively, if things don't go well or you would like to watch YodaQA
think, try passing an extra command line parameter
``-Dorg.slf4j.simpleLogger.log.cz.brmlab.yodaqa=debug`` to gradle;
this is **highly recommended**!
INFO LATByWordnet - ?! word highest number of POS NN not in Wordnet
INFO LATByWordnet - ?! word iphone 6 plus' screen size of POS NN not in Wordnet
INFO LATByWordnet - ?! word two people of POS NNS not in Wordnet
INFO LATByWordnet - ?! word android version of POS NN not in Wordnet
INFO LATByWordnet - ?! word 5 ks of POS NNS not in Wordnet
INFO LATByWordnet - ?! word total value of POS NN not in Wordnet
INFO LATByWordnet - ?! cannot expand LAT of POS WDT
INFO LATByWordnet - ?! cannot expand LAT of POS WDT
INFO LATByWordnet - ?! word total value of POS NN not in Wordnet
INFO LATByWordnet - ?! word total value of POS NN not in Wordnet

These warnings above are highly unusual - are you sure you didn't change anything else?

pasky commented 8 years ago

No worries, question dump does not use Solr.

But I'd like if you could first measure how fast totally plain YodaQA without any customizations is for you.

iampkuhz commented 8 years ago

sorry for paste this mess log file.. here is my log file for one question. it seems the CluesToConcepts part tooks really large time. can you give me some advice? (CluesMergeByText part is fast though)

DEBUG FocusGenerator - DET+W agreement
DEBUG SVGenerator - SV: signed
DEBUG LATByFocus - new LAT by Focus: <<agreement>>/0
DEBUG LATByWordnet - expanded LAT agreement to wn LATs:  | statement/6735202:-1.0 | message/6611268:-2.0 | communication/33319:-3.0 | compatibility/4720011:-1.0 | characteristic/4738737:-2.0 | quality/4731092:-3.0 | attribute/24444:-4.0 | harmony/13992690:-1.0 | order/13991994:-2.0 | state/24900:-3.0 | planning/5802702:-1.0 | thinking/5778923:-2.0 | higher cognitive process/5778661:-3.0 | process/5709328:-4.0 | grammatical relation/13818991:-1.0 | linguistic relation/13819354:-2.0 | relation/32220:-3.0 | speech act/7175534:-1.0 | act/30657:-2.0
DEBUG ClueByTokenConstituent - new by NP: agreement
DEBUG ClueByTokenConstituent - new by Token: agreement
DEBUG ClueByTokenConstituent - new by NP: Belfast
DEBUG ClueByTokenConstituent - new by Token: Belfast
DEBUG ClueByTokenConstituent - new by NP: April 10, 1998
DEBUG ClueByTokenConstituent - new by Token: April
DEBUG ClueByTokenConstituent - new by Token: 10
DEBUG ClueByTokenConstituent - new by Token: 1998
DEBUG ClueBySV - new by SV: signed
DEBUG ClueByNE - new by NamedEntity: Belfast
DEBUG ClueByNE - new by NamedEntity: April 10, 1998
DEBUG ClueByLAT - new by LAT: agreement
DEBUG ClueBySubject - new by Subject Token: agreement
DEBUG CluesToConcepts - fuzzy-lookup(agreement) returned: d0.0 ~Agreement [Agreement] Agreement 0
DEBUG CluesToConcepts - fuzzy-lookup(agreement) returned: d3.0 ~Agreeing [Agreement] Agreement 0
DEBUG CluesToConcepts - sqlite-lookup(agreement) returned: p0.423515 ~Agreement [Agreement] Agreement 0
DEBUG CluesToConcepts - sqlite-lookup(agreement) returned: p0.347661 ~Agreement [Operation_Agreement] Operation_Agreement 0
DEBUG CluesToConcepts - merge: fuzzyResult+cwResult Agreement
DEBUG CluesToConcepts - merge: cwResult Operation_Agreement
DEBUG CluesToConcepts - fuzzy-lookup(Belfast) returned: d0.0 ~Belfast [Belfast] Belfast 0
DEBUG CluesToConcepts - fuzzy-lookup(Belfast) returned: d0.5 ~BELFAST [Belfast] Belfast 0
DEBUG CluesToConcepts - fuzzy-lookup(Belfast) returned: d2.0 ~Belfanti [Serafino Belfanti] Serafino_Belfanti 0
DEBUG CluesToConcepts - sqlite-lookup(Belfast) returned: p0.858827 ~Belfast [Belfast] Belfast 0
DEBUG CluesToConcepts - merge: fuzzyResult+cwResult Belfast
DEBUG CluesToConcepts - fuzzy-lookup(April 10, 1998) returned: d2.0 ~April 1, 1999 [April 1999] April_1999 0
DEBUG CluesToConcepts - fuzzy-lookup(Belfast) returned: d0.0 ~Belfast [Belfast] Belfast 0
DEBUG CluesToConcepts - fuzzy-lookup(Belfast) returned: d0.5 ~BELFAST [Belfast] Belfast 0
DEBUG CluesToConcepts - fuzzy-lookup(Belfast) returned: d2.0 ~Belfanti [Serafino Belfanti] Serafino_Belfanti 0
DEBUG CluesToConcepts - sqlite-lookup(Belfast) returned: p0.858827 ~Belfast [Belfast] Belfast 0
DEBUG CluesToConcepts - merge: fuzzyResult+cwResult Belfast
DEBUG CluesToConcepts - fuzzy-lookup(April 10, 1998) returned: d2.0 ~April 1, 1999 [April 1999] April_1999 0
DEBUG CluesToConcepts - fuzzy-lookup(Which agreement) returned: d3.0 ~Munich Agreement [Munich Agreement] Munich_Agreement 0
DEBUG CluesToConcepts - fuzzy-lookup(Which agreement) returned: d3.0 ~Munich agreement [Munich Agreement] Munich_Agreement 0
DEBUG CluesToConcepts - 2-gram clue <<Which agreement>> - no match
DEBUG CluesToConcepts - 2-gram clue <<agreement was>> - no match
DEBUG CluesToConcepts - 2-gram clue <<was signed>> - no match
DEBUG CluesToConcepts - fuzzy-lookup(signed in) returned: d2.3 ~Signed [Sign (mathematics)] Sign_(mathematics) 0
DEBUG CluesToConcepts - fuzzy-lookup(signed in) returned: d3.0 ~Linked In [LinkedIn] LinkedIn 0
DEBUG CluesToConcepts - fuzzy-lookup(signed in) returned: d3.0 ~Logged in [Login] Login 0
DEBUG CluesToConcepts - 2-gram clue <<signed in>> - no match
DEBUG CluesToConcepts - fuzzy-lookup(in Belfast) returned: d2.0 ~FM Belfast [FM Belfast] FM_Belfast 0
DEBUG CluesToConcepts - fuzzy-lookup(in Belfast) returned: d3.0 ~Hms belfast [HMS Belfast (C35)] HMS_Belfast_(C35) 0
DEBUG CluesToConcepts - 2-gram clue <<in Belfast>> - no match
DEBUG CluesToConcepts - fuzzy-lookup(Belfast,) returned: d2.8 ~Belfast, ME [Belfast, Maine] Belfast,_Maine 0
DEBUG CluesToConcepts - sqlite-lookup(Belfast,) returned: p0.887755 ~Belfast, [Belfast] Belfast 0
DEBUG CluesToConcepts - sqlite-lookup(Belfast,) returned: p0.0969388 ~Belfast, [Belfast_Agreement] Belfast_Agreement 0
DEBUG CluesToConcepts - merge: fuzzyResult Belfast,_Maine
DEBUG CluesToConcepts - merge: cwResult Belfast
DEBUG CluesToConcepts - merge: cwResult Belfast_Agreement
DEBUG CluesToConcepts - fuzzy-lookup(Belfast) returned: d0.0 ~Belfast [Belfast] Belfast 0
DEBUG CluesToConcepts - fuzzy-lookup(Belfast) returned: d0.5 ~BELFAST [Belfast] Belfast 0
DEBUG CluesToConcepts - fuzzy-lookup(Belfast) returned: d2.0 ~Belfanti [Serafino Belfanti] Serafino_Belfanti 0
DEBUG CluesToConcepts - sqlite-lookup(Belfast) returned: p0.858827 ~Belfast [Belfast] Belfast 0
DEBUG CluesToConcepts - merge: fuzzyResult+cwResult Belfast
DEBUG CluesToConcepts - 2-gram clue <<Belfast,>> - no match
DEBUG CluesToConcepts - fuzzy-lookup(, on) returned: d2.0 ~On [On] On 0
DEBUG CluesToConcepts - fuzzy-lookup(, on) returned: d2.3 ~, [Comma] Comma 0
DEBUG CluesToConcepts - fuzzy-lookup(, on) returned: d2.5 ~ON [On] On 0
DEBUG CluesToConcepts - fuzzy-lookup(on) returned: d0.0 ~On [On] On 0
DEBUG CluesToConcepts - fuzzy-lookup(on) returned: d0.5 ~ON [On] On 0
DEBUG CluesToConcepts - fuzzy-lookup(on) returned: d2.5 ~TNN [TNN] TNN 0
DEBUG CluesToConcepts - sqlite-lookup(on) returned: p0.429551 ~On [On] On 0
DEBUG CluesToConcepts - sqlite-lookup(on) returned: p0.103958 ~On [Wednesday] Wednesday 0
DEBUG CluesToConcepts - sqlite-lookup(on) returned: p0.0627968 ~On [On_(EP)] On_(EP) 0
DEBUG CluesToConcepts - merge: fuzzyResult+cwResult On
DEBUG CluesToConcepts - merge: cwResult Wednesday
DEBUG CluesToConcepts - merge: cwResult On_(EP)
DEBUG CluesToConcepts - 2-gram clue <<, on>> - no match
DEBUG CluesToConcepts - fuzzy-lookup(on April) returned: d2.0 ~10 April [April 10] April_10 0
DEBUG CluesToConcepts - fuzzy-lookup(on April) returned: d3.0 ~5th April [April 5] April_5 0
DEBUG CluesToConcepts - 2-gram clue <<on April>> - no match
DEBUG CluesToConcepts - fuzzy-lookup(April 10) returned: d0.0 ~April 10 [April 10] April_10 0
DEBUG CluesToConcepts - fuzzy-lookup(April 10) returned: d0.5 ~APRIL 10 [April 10] April_10 0
DEBUG CluesToConcepts - fuzzy-lookup(April 10) returned: d3.0 ~April 11th [April 11] April_11 0
DEBUG CluesToConcepts - sqlite-lookup(April 10) returned: p0.999394 ~April 10 [April_10] April_10 0
DEBUG CluesToConcepts - merge: fuzzyResult+cwResult April_10
DEBUG CluesToConcepts - 2-gram clue <<April 10>> - no match
DEBUG CluesToConcepts - fuzzy-lookup(10,) returned: d0.5 ~10's [10s] 10s 0
DEBUG CluesToConcepts - fuzzy-lookup(10,) returned: d1.0 ~1, [Onekama, Michigan] Onekama,_Michigan 0
DEBUG CluesToConcepts - fuzzy-lookup(10,) returned: d1.2 ~10+2 [10+2] 10+2 0
DEBUG CluesToConcepts - sqlite-lookup(10,) returned: p0.214286 ~10, [List_of_R-phrases] List_of_R-phrases 0
DEBUG CluesToConcepts - sqlite-lookup(10,) returned: p0.0892857 ~10, [10] 10 0
DEBUG CluesToConcepts - sqlite-lookup(10,) returned: p0.0892857 ~10, [August_10] August_10 0
DEBUG CluesToConcepts - merge: fuzzyResult 10s
DEBUG CluesToConcepts - merge: cwResult List_of_R-phrases
DEBUG CluesToConcepts - merge: cwResult August_10
DEBUG CluesToConcepts - merge: cwResult 10
DEBUG CluesToConcepts - fuzzy-lookup(10) returned: d0.0 ~10 [10] 10 0
DEBUG CluesToConcepts - fuzzy-lookup(10) returned: d1.2 ~1/8 [1/8] 1/8 0
DEBUG CluesToConcepts - fuzzy-lookup(10) returned: d1.2 ~1/x [Multiplicative inverse] Multiplicative_inverse 0
DEBUG CluesToConcepts - sqlite-lookup(10) returned: p0.121804 ~10 [ICD-10] ICD-10 0
DEBUG CluesToConcepts - merge: fuzzyResult 10
DEBUG CluesToConcepts - merge: cwResult ICD-10
DEBUG CluesToConcepts - 2-gram clue <<10,>> - no match
DEBUG CluesToConcepts - fuzzy-lookup(, 1998) returned: d2.3 ~+998 [Telephone numbers in Uzbekistan] Telephone_numbers_in_Uzbekistan 0
DEBUG CluesToConcepts - sqlite-lookup(, 1998) returned: p1.0 ~, 1998 [1998] 1998 0
DEBUG CluesToConcepts - merge: fuzzyResult Telephone_numbers_in_Uzbekistan
DEBUG CluesToConcepts - merge: cwResult 1998
DEBUG CluesToConcepts - fuzzy-lookup(1998) returned: d0.0 ~1998 [1998] 1998 0
DEBUG CluesToConcepts - fuzzy-lookup(1998) returned: d1.0 ~+998 [Telephone numbers in Uzbekistan] Telephone_numbers_in_Uzbekistan 0
DEBUG CluesToConcepts - fuzzy-lookup(1998) returned: d3.0 ~\u0998 [Bengali alphabet] Bengali_alphabet 0
DEBUG CluesToConcepts - sqlite-lookup(1998) returned: p0.556283 ~1998 [1998] 1998 0
DEBUG CluesToConcepts - merge: fuzzyResult+cwResult 1998
DEBUG CluesToConcepts - 2-gram clue <<, 1998>> - no match
DEBUG CluesToConcepts - 3-gram clue <<Which agreement was>> - no match
DEBUG CluesToConcepts - 3-gram clue <<agreement was signed>> - no match
DEBUG CluesToConcepts - 3-gram clue <<was signed in>> - no match
DEBUG CluesToConcepts - 3-gram clue <<signed in Belfast>> - no match
DEBUG CluesToConcepts - fuzzy-lookup(in Belfast) returned: d2.0 ~FM Belfast [FM Belfast] FM_Belfast 0
DEBUG CluesToConcepts - fuzzy-lookup(in Belfast) returned: d3.0 ~Hms belfast [HMS Belfast (C35)] HMS_Belfast_(C35) 0
DEBUG CluesToConcepts - 3-gram clue <<in Belfast,>> - no match
DEBUG CluesToConcepts - fuzzy-lookup(Belfast, on) returned: d2.5 ~Belfast, PA [Belfast, Pennsylvania] Belfast,_Pennsylvania 0
DEBUG CluesToConcepts - 3-gram clue <<Belfast, on>> - no match
DEBUG CluesToConcepts - fuzzy-lookup(on April) returned: d2.0 ~10 April [April 10] April_10 0
DEBUG CluesToConcepts - fuzzy-lookup(on April) returned: d3.0 ~5th April [April 5] April_5 0
DEBUG CluesToConcepts - 3-gram clue <<, on April>> - no match
DEBUG CluesToConcepts - fuzzy-lookup(on April 10) returned: d3.0 ~April 10 [April 10] April_10 0
DEBUG CluesToConcepts - 3-gram clue <<on April 10>> - no match
DEBUG CluesToConcepts - fuzzy-lookup(April 10,) returned: d0.2 ~April 10 [April 10] April_10 0
DEBUG CluesToConcepts - fuzzy-lookup(April 10,) returned: d3.0 ~April 11th [April 11] April_11 0
DEBUG CluesToConcepts - fuzzy-lookup(April 10,) returned: d3.0 ~April 13th [April 13] April_13 0
DEBUG CluesToConcepts - sqlite-lookup(April 10,) returned: p1.0 ~April 10, [April_10] April_10 0
DEBUG CluesToConcepts - merge: fuzzyResult+cwResult April_10
DEBUG CluesToConcepts - fuzzy-lookup(April 10) returned: d0.0 ~April 10 [April 10] April_10 0
DEBUG CluesToConcepts - fuzzy-lookup(April 10) returned: d0.5 ~APRIL 10 [April 10] April_10 0
DEBUG CluesToConcepts - fuzzy-lookup(April 10) returned: d3.0 ~April 11th [April 11] April_11 0
DEBUG CluesToConcepts - sqlite-lookup(April 10) returned: p0.999394 ~April 10 [April_10] April_10 0
DEBUG CluesToConcepts - merge: fuzzyResult+cwResult April_10
DEBUG CluesToConcepts - 3-gram clue <<April 10,>> - no match
DEBUG CluesToConcepts - fuzzy-lookup(10, 1998) returned: d2.5 ~1098 [1098] 1098 0
DEBUG CluesToConcepts - fuzzy-lookup(10, 1998) returned: d2.5 ~1018 [1018] 1018 0
DEBUG CluesToConcepts - 3-gram clue <<10, 1998>> - no match
DEBUG CluesToConcepts - 4-gram clue <<Which agreement was signed>> - no match
DEBUG CluesToConcepts - 4-gram clue <<agreement was signed in>> - no match
DEBUG CluesToConcepts - 4-gram clue <<was signed in Belfast>> - no match
DEBUG CluesToConcepts - 4-gram clue <<signed in Belfast,>> - no match
DEBUG CluesToConcepts - 4-gram clue <<in Belfast, on>> - no match
DEBUG CluesToConcepts - fuzzy-lookup(Belfast, on April) returned: d3.0 ~Belfast, Ontario [Ashfield\u2013Colborne\u2013Wawanosh] Ashfield%E2%80%93Colborne%E2%80%93Wawanosh 0
DEBUG CluesToConcepts - 4-gram clue <<Belfast, on April>> - no match
DEBUG CluesToConcepts - fuzzy-lookup(on April 10) returned: d3.0 ~April 10 [April 10] April_10 0
DEBUG CluesToConcepts - 4-gram clue <<, on April 10>> - no match
DEBUG CluesToConcepts - fuzzy-lookup(on April 10) returned: d3.0 ~April 10 [April 10] April_10 0
DEBUG CluesToConcepts - 4-gram clue <<on April 10,>> - no match
DEBUG CluesToConcepts - fuzzy-lookup(April 10, 1998) returned: d2.0 ~April 1, 1999 [April 1999] April_1999 0
DEBUG CluesToConcepts - 4-gram clue <<April 10, 1998>> - no match
DEBUG CluesMergeByText - subduing agreement(CluePhrase:0.99,false) <| agreement(ClueSubjectToken:2.5,true)
DEBUG CluesMergeByText - subduing agreement(ClueToken:1.0,true) <| agreement(ClueSubjectToken:2.5,true)
DEBUG CluesMergeByText - subduing agreement(ClueLAT:1.5,false) <| agreement(ClueSubjectToken:2.5,true)
DEBUG CluesMergeByText - subduing Belfast(CluePhrase:0.99,false) <| Belfast(ClueNE:2.0,true)
DEBUG CluesMergeByText - subduing Belfast(ClueToken:1.0,true) <| Belfast(ClueNE:2.0,true)
DEBUG CluesMergeByText - subduing April 10, 1998(CluePhrase:0.99,false) <| April 10, 1998(ClueNE:2.0,true)
c ROOT null [Which agreement was signed in Belfast, on April 10, 1998?]
 c SBARQ null [Which agreement was signed in Belfast, on April 10, 1998?]
  c WHNP null [Which agreement]
   t WDT which [Which]
   c NP null [agreement]
    t NN agreement [agreement]
  c SQ null [was signed in Belfast, on April 10, 1998]
   t VBD be [was]
   c VP null [signed in Belfast, on April 10, 1998]
    t VBN sign [signed]
    c PP null [in Belfast, on April 10, 1998]
     c PP null [in Belfast]
      t IN in [in]
      c NP null [Belfast]
       t NNP Belfast [Belfast]
     t , , [,]
     c PP null [on April 10, 1998]
      t IN on [on]
      c NP null [April 10, 1998]
       t NNP April [April]
       t CD 10 [10]
       t , , [,]
       t CD 1998 [1998]
  t . ? [?]
iampkuhz commented 8 years ago

I have clone a brand new YodaQA and start from scratch in another place and the questionDump seem the same speed..

the only change of the new project is changing DBpedia server from online to local in src/main/java/cz/brmlab/yodaqa/provider/rdf/DBpediaLookup.java (otherwise i'll meet Building 83% > :test stuck problem and gradlew questionDump will get Connection reset exception (*** http://dbpedia.ailao.eu:3030/dbpedia/query SPARQL Query (temporarily?) failed, retrying in a moment...))

any suggestions on how to fix this?

thanks!

pasky commented 8 years ago

One idea - could your IPv6 connectivity be broken? In other words, if you type

http://qa.ailao.eu:4567/

in your browser, does it open immediately or take a long time?

Could the Chinese firewall be interfering in communication?

CluesToConcepts contacts our local servers for label lookup - a named entity linking step.

iampkuhz commented 8 years ago

i think i dont have ipv6 address because the ifconfig command say that inet6 addr:XXXX Scope:Link not globle (i'm running this on a remote server in US and i'm not quite sure) however, if wget http://qa.ailao.eu:4567/, very fast.... ((105 MB/s) - “index.html” saved)

pasky commented 8 years ago

Sorry, I don't have any more ideas right now. I'd try to add some debug prints to

src/main/java/cz/brmlab/yodaqa/provider/rdf/DBpediaTitles.java

query() method before/after call of each other method (queryFuzzyLookup(), queryCrossWikiLookup(), results.addAll(queryArticle(a, logger))) to help pinpoint where is it spending so much time...

pasky commented 8 years ago

Also, I'm curious - the first log you posted in this issue is maybe cut off at the beginning? It should start with something like "Preferring IPv6 connections" or a similar message about IPv4.

iampkuhz commented 8 years ago
  1. I have re-install this system on my school's desktop(with ipv6) but haven't found something like "Preferring IPv6 connections", (same speed)
  2. As i see:

these step took large time

DEBUG CluesToConcepts - 2-gram clue XXXX
DEBUG CluesToConcepts - 3-gram clue XXXX
DEBUG CluesToConcepts - 4-gram clue XXXX

while this step debug log lines print very fast

DEBUG CluesToConcepts - fuzzy-lookup XXXX
pasky commented 8 years ago

We never figured this out. :( But it'd probably require some network-level monitoring, so closing for the time being.