DATA_SET_VAR = SELECT() HG19_ENCODE_NARROW;
PROJECTED = PROJECT(pvalue; region_update: start AS start + 1) DATA_SET_VAR;
MATERIALIZE PROJECTED INTO RESULT_DS;
The query fails and gives the following exception:
2017-06-21 15:07:32,809 INFO [SelectIMDWithNoIndex$] hg_narrowPeaks Selected: 4
2017-06-21 15:07:32,816 INFO [ProjectRD$] ----------------ProjectRD executing..
2017-06-21 15:07:32,818 INFO [SelectIRD$] ----------------SelectIRD
2017-06-21 15:07:33,692 WARN [TaskSetManager] Stage 1 contains a task of very large size (444 KB). The maximum recommended task size is 100 KB.
2017-06-21 15:07:35,261 WARN [TaskSetManager] Stage 3 contains a task of very large size (444 KB). The maximum recommended task size is 100 KB.
2017-06-21 15:07:40,301 WARN [TaskSetManager] Lost task 0.0 in stage 7.0 (TID 6, genomic.elet.polimi.it, executor 2): java.lang.ArrayIndexOutOfBoundsException: 6
at it.polimi.genomics.spark.implementation.RegionsOperators.ProjectRD$$anonfun$apply$2$$anonfun$apply$3.apply(ProjectRD.scala:49)
at it.polimi.genomics.spark.implementation.RegionsOperators.ProjectRD$$anonfun$apply$2$$anonfun$apply$3.apply(ProjectRD.scala:49)
at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at it.polimi.genomics.spark.implementation.RegionsOperators.ProjectRD$$anonfun$apply$2.apply(ProjectRD.scala:49)
at it.polimi.genomics.spark.implementation.RegionsOperators.ProjectRD$$anonfun$apply$2.apply(ProjectRD.scala:49)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:150)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
This is due to the fact that in it.polimi.genomics.core.DataStructures.IRVariable.PROJECT at the following part
val all_proj_values : Option[List[Int]] =
if (new_projected_values.isDefined) {
val list = new_projected_values.get
val new_list =
if (extended_values.isDefined){
list ++ ((this.schema.size) to (this.schema.size + extended_values.get.size - 1)).toList
}
else {
list
}
Some(new_list)
} else {
None
}
the list variable is updated with the wrong number of schema fileds in the case of only coordinate modification. We must consider also the case in which the extended_values contains also coordinates like start, stop, strand, etc...
Example query:
The query fails and gives the following exception:
This is due to the fact that in
it.polimi.genomics.core.DataStructures.IRVariable.PROJECT
at the following partthe
list
variable is updated with the wrong number of schema fileds in the case of only coordinate modification. We must consider also the case in which theextended_values
contains also coordinates like start, stop, strand, etc...