Open aviks opened 6 years ago
[Renamed the issue]
So this is due to the fact that Azure uses a separate wasb://
protocol layered over hdfs://
, which uses azure blob store as the underlying storage. This will probably need to be supported explicitly within Elly.
Some background: https://blogs.msdn.microsoft.com/cindygross/2015/02/04/understanding-wasb-and-hadoop-storage-in-azure/
Similarly, HDInsight supports the adl://
protocol that uses Azure Data Lake Store as the underlying storage engine for hadoop. Would be good to support that as well.
Looks like this wasb
support came in with Hadoop v2.9: https://hadoop.apache.org/docs/r2.9.0/hadoop-azure/index.html#Introduction
But what is not clear yet to me is whether the server will transparently wrap wasb
and present a hdfs
interface. If that is true then we should be able to access wasb
by just upgrading Elly to use v2.9 protobuf apis. But I am still unsure how/why that would work. Will dig a bit deeper.
This looks like being entirely implemented as a client library - see org/apache/hadoop/fs/azure/NativeAzureFileSystem.html source.
It seems to be reading the hdfs config, but it interacts with azure services directly. The hdfs namenode and datanodes do not seem to be aware of this at all.
So, the implementation of HDFSFile
in Elly.jl can cater only to hdfs://
filesystem. And we probably need to look at Azure apis to do an implementation of NativeAzureFile
on similar lines in Julia. Also there doesn't seem to be any direct Azure API for this (wasb
) filesystem protocol, only APIs for blobstore. We will need to implement the filesystem metadata management in Julia as well.
I can see the files using
hadoop fs -ls
but not usingreaddir
. Trying to create a file reference for a file I know to exist usingHDSFFile
and thenstat
showsElly.HDFSException("Path not found")