foursquare / fsqio

A monorepo that holds all of Foursquare's opensource projects
Apache License 2.0
252 stars 54 forks source link

Twofishes: ConnectionError building index #40

Closed gyscos closed 6 years ago

gyscos commented 7 years ago

Freshly checked out tags/fsqio-2017-02-16-1638, I consistently get this error when building the index:

% ./src/jvm/io/fsq/twofishes/scripts/parse.py -w /opt/twofishes-output
outputting index to /opt/twofishes-output
Are you suuuuuure you want to drop your mongo data? Type "yes" to continue: yes
./pants run src/jvm/io/fsq/twofishes/indexer/importers/geonames:geonames-parser --jvm-run-jvm-options=-Dlogback.configurationFile=src/jvm/io/fsq/twofishes/indexer/data/logback.xml --jvm-run-jvm-program-args=--parse_world --jvm-run-jvm-program-args=true --jvm-run-jvm-program-args=--output_revgeo_index --jvm-run-jvm-program-args=false --jvm-run-jvm-program-args=--output_s2_covering_index --jvm-run-jvm-program-args=false --jvm-run-jvm-program-args=--output_s2_interior_index --jvm-run-jvm-program-args=false --jvm-run-jvm-program-args=--output_prefix_index --jvm-run-jvm-program-args=true --jvm-run-jvm-program-args=--reload_data --jvm-run-jvm-program-args=true --jvm-run-jvm-program-args=--hfile_basepath --jvm-run-jvm-program-args=/opt/twofishes-output

11:28:32 00:00 [main]
               (To run a reporting server: ./pants server)
11:28:32 00:00   [setup]
11:28:32 00:00     [parse]
               Executing tasks in goals: tag -> bootstrap -> imports -> unpack-jars -> validate -> build-spindle -> jvm-platform-validate -> deferred-sources -> gen -> webpack -> pom-resolve -> resources -> compile -> run
11:28:32 00:00   [tag]
11:28:32 00:00     [tag]
11:28:32 00:00   [bootstrap]
11:28:32 00:00     [substitute-aliased-targets]
11:28:32 00:00     [jar-dependency-management]
11:28:32 00:00     [bootstrap-jvm-tools]
11:28:32 00:00     [provide-tools-jar]
11:28:32 00:00   [imports]
11:28:32 00:00     [ivy-imports]
11:28:32 00:00   [unpack-jars]
11:28:32 00:00     [unpack-jars]
11:28:32 00:00   [validate]
11:28:32 00:00     [validate]
11:28:32 00:00   [build-spindle]
11:28:32 00:00     [build-spindle]
11:28:32 00:00       [cache] 
                   No cached artifacts for 1 target.
                   Invalidated 1 target.
11:28:32 00:00       [spindle-build]

11:28:33 00:00 [main]
               (To run a reporting server: ./pants server)
11:28:33 00:00   [setup]
11:28:33 00:00     [parse]
               Executing tasks in goals: tag -> bootstrap -> imports -> unpack-jars -> validate -> build-spindle -> deferred-sources -> jvm-platform-validate -> gen -> webpack -> pom-resolve -> resources -> compile -> bundle
11:28:33 00:00   [tag]
11:28:33 00:00     [tag]
11:28:33 00:00   [bootstrap]
11:28:33 00:00     [substitute-aliased-targets]
11:28:33 00:00     [jar-dependency-management]
11:28:33 00:00     [bootstrap-jvm-tools]
11:28:33 00:00     [provide-tools-jar]
11:28:33 00:00   [imports]
11:28:33 00:00     [ivy-imports]
11:28:33 00:00   [unpack-jars]
11:28:33 00:00     [unpack-jars]
11:28:33 00:00   [validate]
11:28:33 00:00     [validate]
11:28:33 00:00   [build-spindle]
11:28:33 00:00     [build-spindle]
11:28:33 00:00   [deferred-sources]
11:28:33 00:00     [deferred-sources]
11:28:33 00:00   [jvm-platform-validate]
11:28:33 00:00     [jvm-platform-validate]
11:28:33 00:00   [gen]
11:28:33 00:00     [thrift]
11:28:33 00:00     [protoc]
11:28:33 00:00     [antlr]
11:28:33 00:00     [ragel]
11:28:33 00:00     [jaxb]
11:28:33 00:00     [wire]
11:28:33 00:00     [validate-graph]
11:28:33 00:00     [spindle]
11:28:33 00:00   [webpack]
11:28:33 00:00     [webpack-resolve]
11:28:33 00:00     [webpack-gen]
11:28:33 00:00   [pom-resolve]
11:28:33 00:00     [pom-resolve]
                   Invalidated 86 targets.
11:28:33 00:00       [traverse-pom-graph]Exception caught: (<class 'requests.exceptions.ConnectionError'>)
  File "/home/ubuntu/.cache/fsqio/setup/bootstrap/1.2.1rc0/bin/pants", line 11, in <module>
    sys.exit(main())
  File "/home/ubuntu/.cache/fsqio/setup/bootstrap/pants.nkHwgM/install/local/lib/python2.7/site-packages/pants/bin/pants_exe.py", line 44, in main
    PantsRunner(exiter).run()
  File "/home/ubuntu/.cache/fsqio/setup/bootstrap/pants.nkHwgM/install/local/lib/python2.7/site-packages/pants/bin/pants_runner.py", line 57, in run
    options_bootstrapper=options_bootstrapper)
  File "/home/ubuntu/.cache/fsqio/setup/bootstrap/pants.nkHwgM/install/local/lib/python2.7/site-packages/pants/bin/pants_runner.py", line 46, in _run
    return LocalPantsRunner(exiter, args, env, options_bootstrapper=options_bootstrapper).run()
  File "/home/ubuntu/.cache/fsqio/setup/bootstrap/pants.nkHwgM/install/local/lib/python2.7/site-packages/pants/bin/local_pants_runner.py", line 53, in run
    self._maybe_profiled(self._run)
  File "/home/ubuntu/.cache/fsqio/setup/bootstrap/pants.nkHwgM/install/local/lib/python2.7/site-packages/pants/bin/local_pants_runner.py", line 50, in _maybe_profiled
    runner()
  File "/home/ubuntu/.cache/fsqio/setup/bootstrap/pants.nkHwgM/install/local/lib/python2.7/site-packages/pants/bin/local_pants_runner.py", line 95, in _run
    goal_runner_result = goal_runner.run()
  File "/home/ubuntu/.cache/fsqio/setup/bootstrap/pants.nkHwgM/install/local/lib/python2.7/site-packages/pants/bin/goal_runner.py", line 268, in run
    result = self._execute_engine()
  File "/home/ubuntu/.cache/fsqio/setup/bootstrap/pants.nkHwgM/install/local/lib/python2.7/site-packages/pants/bin/goal_runner.py", line 257, in _execute_engine
    result = engine.execute(self._context, self._goals)
  File "/home/ubuntu/.cache/fsqio/setup/bootstrap/pants.nkHwgM/install/local/lib/python2.7/site-packages/pants/engine/legacy_engine.py", line 26, in execute
    self.attempt(context, goals)
  File "/home/ubuntu/.cache/fsqio/setup/bootstrap/pants.nkHwgM/install/local/lib/python2.7/site-packages/pants/engine/round_engine.py", line 224, in attempt
    goal_executor.attempt(explain)
  File "/home/ubuntu/.cache/fsqio/setup/bootstrap/pants.nkHwgM/install/local/lib/python2.7/site-packages/pants/engine/round_engine.py", line 47, in attempt
    task.execute()
  File "/opt/fsqio/src/python/fsqio/pants/pom/pom_resolve.py", line 566, in execute
    global_pinned_versions,
  File "/opt/fsqio/src/python/fsqio/pants/pom/pom_resolve.py", line 415, in resolve_dependency_graphs
    for jar_lib, target_dep_graph in izip(all_jar_libs, dep_graph_iterator):
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 668, in next
    raise value

Exception message: None: Max retries exceeded with url: /webdav/geotools/com/cybozu/labs/langdetect/1.1-20120112/langdetect-1.1-20120112.jar (Caused by redirect)

11:29:13 00:40   [complete]
               FAILURE

FAILURE

11:29:14 00:42   [complete]
               FAILURE

Sometimes the problematic url changes; I have also seen so far:

/webdav/geotools/xml-apis/xml-apis/1.3.04/xml-apis-1.3.04.pom
/webdav/geotools/joda-time/joda-time/2.9.7/joda-time-2.9.7.jar
/webdav/geotools/aopalliance/aopalliance/1.0/aopalliance-1.0.pom
/webdav/geotools/org/apache/hadoop/hadoop-mapreduce-client-core/2.6.0/hadoop-mapreduce-client-core-2.6.0.jar
/maven2/com/github/salat/salat-util_2.11/1.10.0/salat-util_2.11-1.10.0.pom
mateor commented 7 years ago

Hi. Sorry that you are having trouble with this. Unfortunately, I do not have an answer that I know will fix it for you. We have seen scattered instances of this, and have yet to determine a concrete fix.

The issue you are having is from the jar resolver, pom-resolve and the maven repo that hosts the geotools.

I have a couple things that should get you unblocked, though. First and easiest thing would be to try these adjustments to your host file.

If that doesn't work, then you can try an experiment I just threw together when I saw your issue

Try doing initial sync with Ivy

This branch allows temporarily running Fsq.io using the Ivy resolver. You can check out the fork on my Github page.

The idea have Ivy do the initial sync to the pom directory and then allow pom-resolve to take over from there. They will not pull down the exact same set of jars but the webdav libraries should be identical.

You will need to run it with --no-verify-config.

    rm -rf .pants.d .local_artifact_cache
    ./pants --no-verify-config compile src:: test::

If it succeeds, try checking out the Fsq.io release again and see how it goes.

gyscos commented 7 years ago

Thank you for the help!

I don't seem to be going anywhere though:

mateor commented 7 years ago

Ah, yeah, we pin streamz in pom-resolve world. Okay. So I tried for awhile to see if we could work around the hosting issues from that repo. But I could not, it seems like the central hub for geo stuff.

So I pushed an update to that fork that uses a couple other geo repos to try and spread the load, as well as adding an Ivy pin for the streams library.

I also perhaps saw a reason why pom-resolve did not reuse the Ivy download, which was a mistake on my part. But is possible that the failure is even at ping time, at which point I do not have a good answer. Hard to debug without a repro, although I do not doubt that the issue exists.

I updated the fork in the meantime, perhaps try that again and you may be able to run the indexer under Ivy while we consider our long term options.

gyscos commented 7 years ago

For now I stayed on the mateo.ivy_resolve branch, and added --no-verify-config to the pants command that's usually started by parse.py, and it seems to be indexing allright! I'll try the updated fork when it completes.

mateor commented 7 years ago

Good luck!

gyscos commented 7 years ago

I was able to build the index, but then I tried to serve the result with the following command:

./pants run src/jvm/io/fsq/twofishes/server:server-bin --jvm-run-jvm-program-args=--preload --jvm-run-jvm-program-args=False --jvm-run-jvm-program-args=--warmup --jvm-run-jvm-program-args=False --jvm-run-jvm-program-args=--enable_private_endpoints --jvm-run-jvm-program-args=False --jvm-run-jvm-program-args=--host --jvm-run-jvm-program-args=0.0.0.0 --jvm-run-jvm-program-args=--port --jvm-run-jvm-program-args=8080 --jvm-run-jvm-program-args=--hfile_basepath --jvm-run-jvm-program-args=/opt/twofishes-output --no-verify-config
10:00:20 00:00 [main]
               (To run a reporting server: ./pants server)
10:00:20 00:00   [setup]
10:00:20 00:00     [parse]
               Executing tasks in goals: tag -> bootstrap -> imports -> unpack-jars -> validate -> build-spindle -> deferred-sources -> jvm-platform-validate -> gen -> webpack -> resolve -> compile -> resources -> run
10:00:20 00:00   [tag]
10:00:20 00:00     [tag]
10:00:20 00:00   [bootstrap]
10:00:20 00:00     [substitute-aliased-targets]
10:00:20 00:00     [jar-dependency-management]
10:00:20 00:00     [bootstrap-jvm-tools]
10:00:20 00:00     [provide-tools-jar]
10:00:20 00:00   [imports]
10:00:20 00:00     [ivy-imports]
10:00:20 00:00   [unpack-jars]
10:00:20 00:00     [unpack-jars]
10:00:20 00:00   [validate]
10:00:20 00:00     [validate]
                   Invalidated 6 targets.
10:00:21 00:01   [build-spindle]
10:00:21 00:01     [build-spindle]
10:00:21 00:01   [deferred-sources]
10:00:21 00:01     [deferred-sources]
10:00:21 00:01   [jvm-platform-validate]
10:00:21 00:01     [jvm-platform-validate]
                   Invalidated 2 targets.
10:00:21 00:01   [gen]
10:00:21 00:01     [thrift]
10:00:21 00:01     [protoc]
10:00:21 00:01     [antlr]
10:00:21 00:01     [ragel]
10:00:21 00:01     [jaxb]
10:00:21 00:01     [wire]
10:00:21 00:01     [validate-graph]
10:00:21 00:01     [spindle]
10:00:21 00:01   [webpack]
10:00:21 00:01     [webpack-resolve]
10:00:21 00:01     [webpack-gen]
10:00:21 00:01   [resolve]
10:00:21 00:01     [ivy]
                   Invalidated 1 target.
10:00:21 00:01       [ivy-resolve]
10:00:26 00:06   [compile]
10:00:26 00:06     [compile-jvm-prep-command]
10:00:26 00:06       [jvm_prep_command]
10:00:26 00:06     [scalafmt]
10:00:26 00:06     [compile-prep-command]
10:00:26 00:06     [compile]
10:00:26 00:06     [zinc]
10:00:26 00:06       [cache]    
                   No cached artifacts for 4 targets.
                   Invalidated 4 targets.
10:00:27 00:07       [isolation-zinc-pool-bootstrap] 
                   [1/4] Compiling 16 zinc sources in 1 target (.pants.d/gen/spindle/scala_record:src.thrift.io.fsq.twofishes.twofishes-scala). 
                   [2/4] Compiling 15 zinc sources in 1 target (src/jvm/io/fsq/twofishes/util:util). 
                   [3/4] Compiling 5 zinc sources in 1 target (src/jvm/io/fsq/twofishes/core:core). 
                   [4/4] Compiling 26 zinc sources in 1 target (src/jvm/io/fsq/twofishes/server:server).
10:00:45 00:25   [resources]
10:00:45 00:25     [prepare]
                   Invalidated 2 targets.
10:00:45 00:25     [services]
10:00:45 00:25   [run]
10:00:45 00:25     [py]
10:00:45 00:25     [jvm]
10:00:45 00:25       [run]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/ubuntu/.pom2/org/slf4j/slf4j-log4j12/1.7.5/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/.pom2/org/slf4j/slf4j-jdk14/1.7.7/slf4j-jdk14-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" java.lang.NoClassDefFoundError: com/google/common/io/NullOutputStream
        at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:543)
        at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:589)
        at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:636)
        at io.fsq.twofishes.server.HFileInput.<init>(HFileStorageService.scala:140)
        at io.fsq.twofishes.server.NameIndexHFileInput.<init>(HFileStorageService.scala:227)
        at io.fsq.twofishes.server.HFileStorageService.<init>(HFileStorageService.scala:25)
        at io.fsq.twofishes.server.ServerStore$.getStore(GeocodeServer.scala:518)
        at io.fsq.twofishes.server.ServerStore$.getStore(GeocodeServer.scala:514)
        at io.fsq.twofishes.server.GeocodeFinagleServer$.main(GeocodeServer.scala:544)
        at io.fsq.twofishes.server.GeocodeFinagleServer.main(GeocodeServer.scala)
Caused by: java.lang.ClassNotFoundException: com.google.common.io.NullOutputStream
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 10 more

FAILURE: java io.fsq.twofishes.server.GeocodeFinagleServer ... exited non-zero (1)

INFO] killing nailgun server pid=161101
INFO] killing nailgun server pid=162479

10:00:47 00:27   [complete]
               FAILURE
mateor commented 7 years ago

Hi - yeah, that is unfortunate but not unexpected, I guess. The hope I had for the Ivy patch was just to get you past the bootstrapping.

Looking at the original output, it occurs to me that this could be an error in the fetching logic. But hard to know when it is being run in the multi-processing pool. If you wanted to continue to work towards a solution, I added another branch to my github page that threads through a debug and logging pipeline.

The debug output can be generated with:

    ./pants -ldebug pom-resolve --pom-resolve-single-process

It will be quite verbose, but may give us a hint.

https://github.com/mateor/fsqio/tree/mateo.pom_logging

mateor commented 7 years ago

(this branch is very lightly tested) - I unfortunately do not have a lot of extra time at the moment, so forgive me if there is anything careless in there :)

mateor commented 6 years ago

An update - this issue ended up spurring me to convert our entire codebase back to Ivy. The next Fsq.io update will no longer use pom-resolve and this issue will be accordingly solved.

Thanks for the spur to make that happen : )