Support for storing indices on HDFS

AndreasHoermandinger commented 9 years ago

It would be nice to store/read indices to/from HDFS. Thus I want to implement this feature. I already had a look and it should be easy to implement it by implementing an Hdfs-Directory like in Solr (or even easier use the existing one, as it is simply an implementation of a BaseDirectory). As in Lucene 5's Directory APIs the file operations are kept isolated inside the Directory implementation it should not be a big problem adding this feature by implementing classes like in org.elasticsearch.index.store.fs.

So I also have a few questions: Is it possible to use Solr's implementation of HdfsDirectory (legally, technically,....) and why is this implemented in Solr, not in Lucene itself?

Also I would like to discuss in what way my contribution would be best implemented in order to be accepted into the project.

costin commented 9 years ago

Hi,

First of all thanks for wanting to contribute.

Second, one can use HDFS right now as a storage for Elasticsearch/Lucene indices by mounting HDFS through its official NFS gateway as a local partition and exposing it to the local OS as just another partition/fs. NFS mounting has been around for quite some time, it is well understood and supported by Hadoop. Note that one needs to pay attention to HDFS missing delete-on-close semantic which is needed by Lucene. Some of the Hadoop vendors offers additional ways to expose HDFS transparently to the OS besides NFS so that applications can use it without having to depend on the (vanilla) Hadoop/HDFS API.

Third, when it comes to HDFS (or any other storage for that matter), one needs to take into account the performance characteristics of HDFS which is, by design, a big (aka plenty of space), distributed file-system. Some might call it an archiving store based on its aim to store large volumes of data across multiple machines at the expense of performance. As each call to the file-system can easily translate to one or multiple calls across the network (where the content might be stored) and since random-access performance is significantly slower (again due to remote calls) than a local storage, using HDFS as the backing storage for a real-time/fast engine is, to say the least, challenging. There are also some issues regarding the fs guarantees (think fsync and see HBase work/fork of HDFS) that bring in a lot of 'baggage' with little (if any) benefits: Elasticsearch already does replication and again, the performance and reliability of the local fs is significantly better (orders of magnitude if you consider the OS page cache) than HDFS. Again, there are ways to tweak the performance of HDFS (potentially by adding some caching in front, tweaking data locality, adjusting replication, etc...) and for those cases, NFS is a good candidate.

That's not to say this is impossible but rather there are some challenges that affect not only performance but also the consistency and durability of the data. And we always prefer to hold on a feature (even if it's really cool) unless we're positive it is reliable.

However, that shouldn't prevent you from experimenting or extending/modifying Elasticsearch. As long as it offers you and others value, go for it! Whether it's part of Elasticsearch proper or not becomes secondary at that point.

P.S. by Hadoop/HDFS I'm referring to vanilla Apache Hadoop. P.P.S. Elasticsearch supports HDFS as a snapshot/restore target through the HDFS repository

AndreasHoermandinger commented 9 years ago

Hi,

Thank you for your answer.

I've tried NFS mounting, but it turned out to be quite unstable. (Especially when Elasticsearch started moving shards.)

In fact right now, I'm not trying to achieve a well performant (in terms of latency) storage for elasticsearch. More like an additional storage that can be used. By using a4e2230ebd5d424a78c266dc98f0521f897e580b for example, one could use Hdfs as a storage for cold indices, that are not frequently queried, but frequent enough that one does not want to index it everytime just to query it. So like you say, it would be used as an archive.

Concerning replicas: theoretically elasticsearch does not need to move shards anymore, when they're on Hdfs. So it would be quite convenient to have shards (where Hdfs does all the replicas) lie on Hdfs and if they need to be moved, they just need to be released and then again found and locked by the new node.

I've already tried around a bit with the existing directory implementation and achieved that indices were stored on Hdfs and could be queried from Hdfs. The only problem I have is to store/read translogs on Hdfs and that indices are deleted in NodeEnvironment and if I don't want to add long if elses with one branch just doing the normal (standard) execution and the other branch just for Hdfs I need to add an implementation on some other way (I've already implemented that one too though).

Thank you also for pointing out the problems with using Hdfs as a storage. Of course I will take care of them as soon as I get a properly running version. Also I never thought that it would be accepted without being stable and properly tested.

clintongormley commented 9 years ago

Concerning replicas: theoretically elasticsearch does not need to move shards anymore, when they're on Hdfs. So it would be quite convenient to have shards (where Hdfs does all the replicas) lie on Hdfs and if they need to be moved, they just need to be released and then again found and locked by the new node.

You may be interested in https://github.com/elasticsearch/elasticsearch/issues/8976 which is being designed for just such a purpose.

AndreasHoermandinger commented 9 years ago

Oh, great Thank you!

jpountz commented 9 years ago

So I also have a few questions: Is it possible to use Solr's implementation of HdfsDirectory (legally, technically,....) and why is this implemented in Solr, not in Lucene itself?

The way I see it is that if you want to have your indices stored in HDFS at all costs, then this is your only option. On the other hand, we are already seeing performance issues with some (supposedly) good network-attached storage, so I don't think having indices stored in HDFS would be a better story. From an elasticsearch perspective, the best you can do is to use a good local drive. If you use anything else then you need to be prepared for keeping wondering whether performance issues you might encounter are due to elasticsearch itself or to the storage.

clintongormley commented 9 years ago

@jpountz agreed - this issue seems to have lost momentum, and there's certainly no interest in supporting this from our side, so I'm going to close this issue.

chenryn commented 8 years ago

@AndreasHoermandinger Did you implement this already? I just find a slide said that "HDFS Directory in Solr only 25% slower than local SSD": http://www.slideshare.net/lucidworks/webinar-solr-fusion-for-big-data-55059041 If so, performance wouldn't be a big problem as NFS mount...

trixpan commented 8 years ago

@clintongormley If I understand correctly this issue seems to be progressing well in here:

https://www.elastic.co/guide/en/elasticsearch/plugins/master/repository-hdfs.html

clintongormley commented 8 years ago

@trixpan that is for snapshotting indices to hdfs, not running a live index on hdfs

tdunning commented 8 years ago

Relative to the use of NFS for storing indexes, this won't work well with NFS over HDFS, but it works great with NFS over MapR FS. The issue is that you have to have first class NFS support. The problem with NFS over HDFS is that it is really NFS over local file system with copy-back to HDFS after inactivity. This leads to some pretty bizarre semantics and seriously impairs performance. There are also serious issues for many applications with the immutability of files on HDFS. Since Lucene actually only writes files once anyway, this is less of an issue.

The HDFS repository that is available as of Lucene 4.7 and which is exposed in Solr from that time might work with Elastic, but the way that Elastic handles updates to indexes is significantly different than the way Solr handles things.

s1monw commented 8 years ago

The HDFS repository that is available as of Lucene 4.7 and which is exposed in Solr from that time might work with Elastic, but the way that Elastic handles updates to indexes is significantly different than the way Solr handles things.

if you alluding to indexing documents on all replicas, that applies only unless you are using shadow replicas. If shadow replicas are used writes only happening on the primary. Yet, the entire story of HDFS vs. some other filesystem underneath is pointless IMO - ES and lucene is designed to run on top of a local FS to perform well. ES adds a distribution layer to lucene and doesn't need another complex distributed component like HDFS. We have a way to snapshot into HDFS for backups etc. but running live index is expected to be local or behind a read/write FS local to the node.

tdunning commented 8 years ago

I understand the bias towards local file systems, but the argument about performance being higher on a local file system is not factually correct. You are almost certainly correct about HDFS, but advanced file-systems like MapR FS can out-perform local file systems, surprisingly enough. That is why Vertica recommends running on MapR FS. That is why Sybase IQ and Hana are qualified by SAP to run on MapR.

The reason that this happens is largely due to the fact that the local filesystems are largely clones of various Unix filesystems that were largely designed in the late 80's when design parameters were very different. Only a very few file systems have actually been designed from zero in the last decade to take good advantage of modern memory, disk and networking systems. A secondary reason is that a good distributed file system will control the spindles directly and will schedule data motion and linearization in a globally optimized fashion. The multi-tenant nature of most Linux file systems prevents proper scheduling and data linearization.

Keep in mind the fact that an advanced distributed file system can keep the data local if desired and that dual 10Gb/s networks will outrun the disks on many, if not most of the systems that you will see if you do decide to replicate the data non-locally.

Aside from performance, one key advantage of running on a distributed file system is the dramatically lowered risk data corruption due to disk-full errors. Local instances generally are relegated to a single partition of fixed and limited size.

chenryn commented 8 years ago

is MapR-FS open source? Do you have a performance testing report for ES on MapR-FS vs HDFS vs Local FS? @tdunning

trixpan commented 8 years ago

@chenryn Mapr-FS is closed source and is part of MapR's offerings. There are multiple generic performance tests but it is known that at least one user had success using ES together with Mesos+Myriad and MapR-FS

https://discuss.elastic.co/t/elasticsearch-and-yarn-my-experience-with-mesos/24724

s1monw commented 8 years ago

You are almost certainly correct about HDFS, but advanced file-systems like MapR FS can out-perform local file systems, surprisingly enough.

Oh man, this is great I was pretty sure we are all largely limited by the speed of light but apparently that is not true? Nobel price? :) Sorry ted, I think statements like this are only confusing people and are far from the the technical aspects we are discussion here. There is no FS that will be faster by fetching stuff over the wire and then going to disk than just reading it from local disks. You have to compare apples and apples so please don't put statements like this here, I would really appreciate that.

I understand the bias towards local file systems, but the argument about performance being higher on...

speaking of bias, I would have appreciated full disclosure on a public issue before advertising closed source software solutions. It's just not fair to others.

A last word on distributed filesystem, it's a massive added complexity to all system that rely on it. Every installation of a distributed FS I have seen requires a dedicated team to keep it running and maintain it. The benefit is very limited compared to the added complexity and the potential sink of uncertainty when it gets to performance debugging I recommend to make your choices wisely.

tdunning commented 8 years ago

Sarcasm aside, I am happy to sit down over a beer and explain how to get the results I am talking about and why it works the way it does. My comments all stem from personal experience. No faster-than-light technology required to substantiate them. A more careful reading of what I wrote before you make a knee jerk response might help as well.

But you are correct that this is wandering from the point of the original enhancement request. If there are questions specifically about how and why HDFS API's might be useful for Elastic, here is a good place. Discussion about the generic capabilities and properties of modern file systems can continue by pinging me on twitter or by email. Both are easy to find.

elastic / elasticsearch

Support for storing indices on HDFS #9072