(This prompts my browser to start a login-process.)
While dataset.to_pandas_dataframe() works fine, when I try dataset.to_spark_dataframe() I get the following Java traceback:
: java.util.NoSuchElementException: key not found: ADLSGen2
at scala.collection.MapLike.default(MapLike.scala:235)
at scala.collection.MapLike.default$(MapLike.scala:234)
at scala.collection.AbstractMap.default(Map.scala:63)
at scala.collection.MapLike.apply(MapLike.scala:144)
at scala.collection.MapLike.apply$(MapLike.scala:143)
at scala.collection.AbstractMap.apply(Map.scala:63)
at com.microsoft.dprep.io.StreamInfoFileSystem$.toFileSystemPath(StreamInfoFileSystem.scala:68)
at com.microsoft.dprep.execution.Storage$.expandHdfsPath(Storage.scala:37)
at com.microsoft.dprep.execution.executors.GetFilesExecutor$.$anonfun$getFiles$1(GetFilesExecutor.scala:18)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
at com.microsoft.dprep.execution.executors.GetFilesExecutor$.getFiles(GetFilesExecutor.scala:12)
at com.microsoft.dprep.execution.LariatDataset$.getFiles(LariatDataset.scala:32)
at com.microsoft.dprep.execution.PySparkExecutor.getFiles(PySparkExecutor.scala:225)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:834)
This is using "com.microsoft.ml.spark:mmlspark_2.12:1.0.0-rc3-62-25d40cff-SNAPSHOT" and PySpark 3.1.2.
What might cause this error?
The Java code is called from a generated Python module which shows where the "ADLSGen2" key comes from:
I'm creating a dataset directly using a URL (relying on identity-based access):
(This prompts my browser to start a login-process.)
While
dataset.to_pandas_dataframe()
works fine, when I trydataset.to_spark_dataframe()
I get the following Java traceback:This is using "com.microsoft.ml.spark:mmlspark_2.12:1.0.0-rc3-62-25d40cff-SNAPSHOT" and PySpark 3.1.2.
What might cause this error?
The Java code is called from a generated Python module which shows where the "ADLSGen2" key comes from: