NICTA / scoobi

A Scala productivity framework for Hadoop.
http://nicta.github.com/scoobi/
482 stars 97 forks source link

REPL 'cat' command is not lazy #293

Closed blever closed 11 years ago

blever commented 11 years ago

The Scoobi REPL includes the function cat that returns an Iterable[String] for a given file path. A typical usage is:

scoobi> cat("hdfs://path/to/file").take(20) foreach { println }

The implementation of cat is currently strict. This means that if the file is large, cat will attempt to bring the entire contents into memory resulting in a heap exception.

Re implement cat (as well as avrocat) to be lazy. Note that support for gob patterns should remain. Scoobi's implementation of BridgeStoreIterator should be instructive and is potentially a source of code reuse and refactoring.

raronson commented 11 years ago

implementation so far, not tested - https://github.com/raronson/scoobi/compare/lazy_cat