DEIB-GECO / PyGMQL

Python Library for data analysis based on GMQL
Apache License 2.0
13 stars 5 forks source link

Datasets in GTF format make the query fail in remote mode #16

Closed lucananni93 closed 6 years ago

lucananni93 commented 6 years ago

When a dataset in GTF format is used in a remote query the following happens.

Example query:

import gmql as gl
gl.set_remote_address("http://gmql.eu/gmql-rest/")
gl.login()
gl.set_mode("remote")
d1 = gl.load_from_remote("Example_Dataset_1", owner="public")
r = d1.materialize()

The following exception is raised:

Traceback (most recent call last):
  File "C:/Users/lucan/Documents/progetti_phd/PyGMQL/test/test_map.py", line 8, in <module>
    r = d1.materialize()
  File "C:\Users\lucan\Documents\progetti_phd\PyGMQL\gmql\dataset\GMQLDataset.py", line 1191, in materialize
    return Materializations.materialize_remote(new_index, output_name, output_path, all_load)
  File "C:\Users\lucan\Documents\progetti_phd\PyGMQL\gmql\dataset\loaders\Materializations.py", line 83, in materialize_remote
    result = remote_manager.execute_remote_all(output_path=download_path)
  File "C:\Users\lucan\Documents\progetti_phd\PyGMQL\gmql\RemoteConnection\RemoteManager.py", line 520, in execute_remote_all
    return self._execute_dag(serialized_dag, output, output_path)
  File "C:\Users\lucan\Documents\progetti_phd\PyGMQL\gmql\RemoteConnection\RemoteManager.py", line 553, in _execute_dag
    self.download_dataset(dataset_name=name, local_path=path)
  File "C:\Users\lucan\Documents\progetti_phd\PyGMQL\gmql\RemoteConnection\RemoteManager.py", line 379, in download_dataset
    return self.download_as_stream(dataset_name, local_path)
  File "C:\Users\lucan\Documents\progetti_phd\PyGMQL\gmql\RemoteConnection\RemoteManager.py", line 402, in download_as_stream
    samples = self.get_dataset_samples(dataset_name)
  File "C:\Users\lucan\Documents\progetti_phd\PyGMQL\gmql\RemoteConnection\RemoteManager.py", line 226, in get_dataset_samples
    return self.process_info_list(res, "info")
  File "C:\Users\lucan\Documents\progetti_phd\PyGMQL\gmql\RemoteConnection\RemoteManager.py", line 188, in process_info_list
    res = pd.concat([res, pd.DataFrame.from_dict(res[info_column].map(extract_infos).tolist())], axis=1)\
  File "C:\Users\lucan\Anaconda3\envs\bio\lib\site-packages\pandas\core\frame.py", line 2139, in __getitem__
    return self._getitem_column(key)
  File "C:\Users\lucan\Anaconda3\envs\bio\lib\site-packages\pandas\core\frame.py", line 2146, in _getitem_column
    return self._get_item_cache(key)
  File "C:\Users\lucan\Anaconda3\envs\bio\lib\site-packages\pandas\core\generic.py", line 1842, in _get_item_cache
    values = self._data.get(item)
  File "C:\Users\lucan\Anaconda3\envs\bio\lib\site-packages\pandas\core\internals.py", line 3843, in get
    loc = self.items.get_loc(item)
  File "C:\Users\lucan\Anaconda3\envs\bio\lib\site-packages\pandas\core\indexes\base.py", line 2527, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'info'

Process finished with exit code 1

This is the exception raised by the GMQL server (netty log):

2018-03-16 11:20:38,557 [INFO] from org.apache.spark.launcher.app.GMQLExecuteCommand in launcher-proc-17 - 18/03/16 11:20:38 ERROR GMQLSparkExecutor: empty.reduceLeft
2018-03-16 11:20:38,557 [INFO] from org.apache.spark.launcher.app.GMQLExecuteCommand in launcher-proc-17 - java.lang.UnsupportedOperationException: empty.reduceLeft
2018-03-16 11:20:38,558 [INFO] from org.apache.spark.launcher.app.GMQLExecuteCommand in launcher-proc-17 -      at scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:180)
2018-03-16 11:20:38,558 [INFO] from org.apache.spark.launcher.app.GMQLExecuteCommand in launcher-proc-17 -      at scala.collection.AbstractTraversable.reduceLeft(Traversable.scala:104)
2018-03-16 11:20:38,558 [INFO] from org.apache.spark.launcher.app.GMQLExecuteCommand in launcher-proc-17 -      at scala.collection.TraversableOnce$class.reduce(TraversableOnce.scala:208)
2018-03-16 11:20:38,558 [INFO] from org.apache.spark.launcher.app.GMQLExecuteCommand in launcher-proc-17 -      at scala.collection.AbstractTraversable.reduce(Traversable.scala:104)
2018-03-16 11:20:38,558 [INFO] from org.apache.spark.launcher.app.GMQLExecuteCommand in launcher-proc-17 -      at it.polimi.genomics.profiling.Profilers.Profiler$.profile(Profiler.scala:147)
2018-03-16 11:20:38,558 [INFO] from org.apache.spark.launcher.app.GMQLExecuteCommand in launcher-proc-17 -      at it.polimi.genomics.spark.implementation.GMQLSparkExecutor$$anonfun$implementation$1.apply(GMQLSparkExecutor.scala:144)
2018-03-16 11:20:38,558 [INFO] from org.apache.spark.launcher.app.GMQLExecuteCommand in launcher-proc-17 -      at it.polimi.genomics.spark.implementation.GMQLSparkExecutor$$anonfun$implementation$1.apply(GMQLSparkExecutor.scala:112)
2018-03-16 11:20:38,558 [INFO] from org.apache.spark.launcher.app.GMQLExecuteCommand in launcher-proc-17 -      at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:73)
2018-03-16 11:20:38,558 [INFO] from org.apache.spark.launcher.app.GMQLExecuteCommand in launcher-proc-17 -      at scala.collection.mutable.MutableList.foreach(MutableList.scala:30)
2018-03-16 11:20:38,558 [INFO] from org.apache.spark.launcher.app.GMQLExecuteCommand in launcher-proc-17 -      at it.polimi.genomics.spark.implementation.GMQLSparkExecutor.implementation(GMQLSparkExecutor.scala:112)
2018-03-16 11:20:38,558 [INFO] from org.apache.spark.launcher.app.GMQLExecuteCommand in launcher-proc-17 -      at it.polimi.genomics.spark.implementation.GMQLSparkExecutor.go(GMQLSparkExecutor.scala:59)
2018-03-16 11:20:38,558 [INFO] from org.apache.spark.launcher.app.GMQLExecuteCommand in launcher-proc-17 -      at it.polimi.genomics.GMQLServer.GmqlServer.run(GmqlServer.scala:23)
2018-03-16 11:20:38,558 [INFO] from org.apache.spark.launcher.app.GMQLExecuteCommand in launcher-proc-17 -      at it.polimi.genomics.cli.GMQLExecuteCommand$.main(GMQLExecuteCommand.scala:265)
2018-03-16 11:20:38,558 [INFO] from org.apache.spark.launcher.app.GMQLExecuteCommand in launcher-proc-17 -      at it.polimi.genomics.cli.GMQLExecuteCommand.main(GMQLExecuteCommand.scala)
2018-03-16 11:20:38,558 [INFO] from org.apache.spark.launcher.app.GMQLExecuteCommand in launcher-proc-17 -      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2018-03-16 11:20:38,558 [INFO] from org.apache.spark.launcher.app.GMQLExecuteCommand in launcher-proc-17 -      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2018-03-16 11:20:38,558 [INFO] from org.apache.spark.launcher.app.GMQLExecuteCommand in launcher-proc-17 -      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2018-03-16 11:20:38,558 [INFO] from org.apache.spark.launcher.app.GMQLExecuteCommand in launcher-proc-17 -      at java.lang.reflect.Method.invoke(Method.java:498)
2018-03-16 11:20:38,559 [INFO] from org.apache.spark.launcher.app.GMQLExecuteCommand in launcher-proc-17 -      at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
2018-03-16 11:20:38,559 [INFO] from org.apache.spark.launcher.app.GMQLExecuteCommand in launcher-proc-17 -      at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
2018-03-16 11:20:38,559 [INFO] from org.apache.spark.launcher.app.GMQLExecuteCommand in launcher-proc-17 -      at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
2018-03-16 11:20:38,559 [INFO] from org.apache.spark.launcher.app.GMQLExecuteCommand in launcher-proc-17 -      at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
2018-03-16 11:20:38,559 [INFO] from org.apache.spark.launcher.app.GMQLExecuteCommand in launcher-proc-17 -      at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
lucananni93 commented 6 years ago

@acanakoglu The problem is in the BedParser implementation.

calculateMapParameters(namePosition: Option[Seq[String]] = None)

should be removed and the schema attribute should be used.