chartbeat-labs / textacy

NLP, before and after spaCy
https://textacy.readthedocs.io
Other
2.21k stars 249 forks source link

Unable to install library on Azure Databricks cluster #283

Closed sanshan20 closed 4 years ago

sanshan20 commented 4 years ago

Trying to install the 'textacy' version 0.7.0 using the command dbutils.library.installPyPI('textacy','0.7.0') fails. Below is the full stack trace. Also, facing this issue only since last week, this has worked before.


Py4JJavaError Traceback (most recent call last)

in () 10 dbutils.library.installPyPI('tqdm','4.32.2') 11 dbutils.library.installPyPI('nltk','3.4.4') ---> 12 dbutils.library.installPyPI('textacy','0.7.0') 13 dbutils.library.installPyPI('wordcloud','1.5.0') 14 dbutils.library.installPyPI('numpy','1.16.3') /local_disk0/tmp/1573462096240-0/dbutils.py in installPyPI(self, project, version, repo, extras) 237 def installPyPI(self, project, version = "", repo = "", extras = ""): 238 return self.print_and_return(self.entry_point.getSharedDriverContext() \ --> 239 .addIsolatedPyPILibrary(project, version, repo, extras)) 240 241 def restartPython(self): /databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in __call__(self, *args) 1255 answer = self.gateway_client.send_command(command) 1256 return_value = get_return_value( -> 1257 answer, self.gateway_client, self.target_id, self.name) 1258 1259 for temp_arg in temp_args: /databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw) 61 def deco(*a, **kw): 62 try: ---> 63 return f(*a, **kw) 64 except py4j.protocol.Py4JJavaError as e: 65 s = e.java_exception.toString() /databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 326 raise Py4JJavaError( 327 "An error occurred while calling {0}{1}{2}.\n". --> 328 format(target_id, ".", name), value) 329 else: 330 raise Py4JError( Py4JJavaError: An error occurred while calling o232.addIsolatedPyPILibrary. : org.apache.spark.SparkException: Process List(/local_disk0/pythonVirtualEnvDirs/virtualEnv-d7a2b38f-6ed0-4027-8a85-d27e96fc822b/bin/python, /local_disk0/pythonVirtualEnvDirs/virtualEnv-d7a2b38f-6ed0-4027-8a85-d27e96fc822b/bin/pip, install, textacy==0.7.0, --disable-pip-version-check) exited with code 1. Error: Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-uh7xw9hg/cytoolz/ at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:1373) at org.apache.spark.util.Utils$.installLibrary(Utils.scala:836) at org.apache.spark.SparkContext.addFile(SparkContext.scala:1700) at org.apache.spark.SparkContext.addFile(SparkContext.scala:1632) at com.databricks.backend.daemon.driver.SharedDriverContext$$anonfun$addIsolatedPyPILibrary$1.apply$mcV$sp(SharedDriverContext.scala:547) at com.databricks.backend.daemon.driver.SharedDriverContext$$anonfun$addIsolatedPyPILibrary$1.apply(SharedDriverContext.scala:547) at com.databricks.backend.daemon.driver.SharedDriverContext$$anonfun$addIsolatedPyPILibrary$1.apply(SharedDriverContext.scala:547) at com.databricks.logging.UsageLogging$$anonfun$recordOperation$1.apply(UsageLogging.scala:369) at com.databricks.logging.UsageLogging$$anonfun$withAttributionContext$1.apply(UsageLogging.scala:238) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) at com.databricks.logging.UsageLogging$class.withAttributionContext(UsageLogging.scala:233) at com.databricks.backend.daemon.driver.SharedDriverContext.withAttributionContext(SharedDriverContext.scala:57) at com.databricks.logging.UsageLogging$class.withAttributionTags(UsageLogging.scala:271) at com.databricks.backend.daemon.driver.SharedDriverContext.withAttributionTags(SharedDriverContext.scala:57) at com.databricks.logging.UsageLogging$class.recordOperation(UsageLogging.scala:350) at com.databricks.backend.daemon.driver.SharedDriverContext.recordOperation(SharedDriverContext.scala:57) at com.databricks.backend.daemon.driver.SharedDriverContext.addIsolatedPyPILibrary(SharedDriverContext.scala:546) at sun.reflect.GeneratedMethodAccessor103.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at py4j.Gateway.invoke(Gateway.java:295) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:251) at java.lang.Thread.run(Thread.java:748)
bdewilde commented 4 years ago

Hi @sanshan20 , since textacy==0.7.0 has been unchanged for several months, I'm guessing the recent error reflects a change in Databricks. In the stack trace, I see issues in pyspark and Java, neither of which textacy interfaces with. I have no way to debug this.