DEIB-GECO / PyGMQL

Python Library for data analysis based on GMQL
Apache License 2.0
13 stars 5 forks source link

Can't run even the example notebooks #28

Closed diveu closed 4 years ago

diveu commented 5 years ago

Hi! I've been trying to get into this API, but I keep getting stuck here :(

I keep getting errors in Docker Image you provided in the Readme.

An error occurred while calling z:it.polimi.genomics.pythonapi.PythonManager.take. : java.lang.IllegalArgumentException: System memory 466092032 must be at least 471859200. Please increase heap size using the --driver-memory option or spark.driver.memory in Spark configuration.

And I can't find any way to fix it :(

And when I'm trying to do things locally, having all of dependencies resolved, in 03a_GWAS_Local I keep getting this error :(


Py4JJavaError Traceback (most recent call last)

in ----> 1 gwas.head().regs ~/anaconda3/lib/python3.7/site-packages/gmql/dataset/GMQLDataset.py in head(self, n) 1400 current_mode = get_mode() 1401 new_index = self.__modify_dag(current_mode) -> 1402 collected = self.pmg.take(new_index, n) 1403 regs = MemoryLoader.load_regions(collected) 1404 meta = MemoryLoader.load_metadata(collected) ~/anaconda3/lib/python3.7/site-packages/py4j/java_gateway.py in __call__(self, *args) 1284 answer = self.gateway_client.send_command(command) 1285 return_value = get_return_value( -> 1286 answer, self.gateway_client, self.target_id, self.name) 1287 1288 for temp_arg in temp_args: ~/anaconda3/lib/python3.7/site-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 326 raise Py4JJavaError( 327 "An error occurred while calling {0}{1}{2}.\n". --> 328 format(target_id, ".", name), value) 329 else: 330 raise Py4JError( Py4JJavaError: An error occurred while calling z:it.polimi.genomics.pythonapi.PythonManager.take. : java.lang.ExceptionInInitializerError at org.apache.hadoop.util.StringUtils.(StringUtils.java:79) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:116) at org.apache.hadoop.security.Groups.(Groups.java:93) at org.apache.hadoop.security.Groups.(Groups.java:73) at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:293) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:283) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:260) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:789) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:774) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:647) at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2430) at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2430) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2430) at org.apache.spark.SparkContext.(SparkContext.scala:295) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509) at it.polimi.genomics.pythonapi.PythonManager$.startSparkContext(PythonManager.scala:394) at it.polimi.genomics.pythonapi.PythonManager$.checkSparkContext(PythonManager.scala:387) at it.polimi.genomics.pythonapi.PythonManager$.take(PythonManager.scala:340) at it.polimi.genomics.pythonapi.PythonManager.take(PythonManager.scala) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:567) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.base/java.lang.Thread.run(Thread.java:835) Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2 at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3410) at java.base/java.lang.String.substring(String.java:1883) at org.apache.hadoop.util.Shell.(Shell.java:50) ... 31 more

Could anybody help me, please?

lucananni93 commented 5 years ago

Hi @diveu ! Thanks for your interest in our project.

I am failing to reproduce your problem, both using my local Jupyter and the Docker image. The error is saying that you are not allocating enough heap memory to the spark process.

Can you please clarify:

  1. on which OS you are executing the notebooks
  2. the available RAM on your system

Can you also try the following at the beginning of your notebook (you will need to restart it before):

import gmql as gl
# this will set your Java heap size to 8 Gb...if your system has less, try a different number
gl.set_local_java_options(["-Xmx8g"])
diveu commented 5 years ago

@lucananni93 Thanks a lot, it worked with Docker! I'm running it on Mac OS X Mojave 10.14.6, 16GB memory. I guess there's something wrong with the rights, since I can't run any function on a loaded data, like head(), etc. :( I guess I'll stick with the Docker for now :)