Closed ellchow closed 11 years ago
I'm not sure what changed, but this seems to have stopped working since the API for sequence IO was modified. For this code import com.nicta.scoobi.Scoobi._
object HelloWorld extends ScoobiApp{
def run() = {
val x = DList((1L, 2L), (2L, 3L),(3L, 4L)).toSequenceFile("checkpoint").checkpoint
val y = x.map(_._2)
persist(y.toTextFile("second"))
}
}
[WARN] LocalJobRunner - job_local_0001
Also, this fails
import com.nicta.scoobi.Scoobi._
import java.io._
import collection.mutable
import com.nicta.scoobi.core.WireFormat
import org.apache.hadoop.io.BytesWritable
object HelloWorld extends ScoobiApp{
implicit def anyWFSeqSchema[A : WireFormat]: SeqSchema[A] = new SeqSchema[A] {
type SeqType = BytesWritable
val b = mutable.ArrayBuffer[Byte]().mapResult(_.toArray)
def toWritable(a: A) = {
val bs = new ByteArrayOutputStream
implicitly[WireFormat[A]].toWire(a, new DataOutputStream(bs))
new BytesWritable(bs.toByteArray)
}
def fromWritable(xs: BytesWritable): A = {
b.clear()
xs.getBytes.take(xs.getLength).foreach { x => b += x }
val bArr = b.result()
val bais = new ByteArrayInputStream(bArr)
implicitly[WireFormat[A]].fromWire(new DataInputStream(bais))
}
val mf: Manifest[SeqType] = implicitly
}
def run() = {
case class Foo(val value: Int)
implicit val FooFmt = mkCaseWireFormat(Foo, Foo unapply _)
val x = DList(Foo(1), Foo(2)).valueToSequenceFile("checkpoint").checkpoint
val y = x.map(e => Foo(e.value + 1))
persist(y.toTextFile("plusone"))
}
}
[WARN] LocalJobRunner - job_local_0001
All of these examples are now working on 0.7.0-SNAPSHOT so I'm closing the pull request.
Note however that the syntax has changed a bit since checkpointing is done like that:
val x = DList(Foo(1), Foo(2)).valueToSequenceFile("checkpoint", checkpoint = true)
I added an implicit SeqSchema for anything that has a WireFormat. It was a bit of a hassle to persist anything as a sequence file that wasn't one of the standard types. Also, I wanted to use it for checkpointing and writing the avro schema for each checkpoint is a bit prohibitive - of course, this may change when/if the plugin is updated for scala 2.10.
Is there a better way to support this?