go-sif / sif

Sif is a framework for fast, predictable, general-purpose distributed computing in the map/reduce paradigm.
Apache License 2.0
32 stars 3 forks source link

What's different https://github.com/chrislusf/gleam #39

Open nodelrd opened 2 years ago

nodelrd commented 2 years ago

https://github.com/chrislusf/gleam was stop update I like sif What's different https://github.com/chrislusf/gleam

Ghnuberath commented 1 year ago

Sorry for the late reply @nodelrd, and thanks for the question. I'm not intimately familiar with https://github.com/chrislusf/gleam, so please treat this as a surface-level comparison after taking a quick look at the repo.

Some key differences that jump out at me:

  1. Sif is seemingly more concerned with emulating a Dataframe-style API, with columns and column types, and provides ways of fetching columnar data from particular places (such as files, or a cloud bucket), parsing it from particular formats (e.g. CSV, JSON, Parquet), and manipulating it using Row/Column primitives when interacting with the Map/Reduce API. The intention is for the API to feel comfortable for folks who have previously used Apache Spark or similar tools.
  2. Sif is more focused on memory use predictability than raw performance, though I do think it's reasonably snappy. My rationale for this focus is summarized in the README
  3. Sif uses Go generics to keep things strongly typed and avoid casting when working with values from Rows
  4. Sif is still in pretty early-stage development. I had a kid recently, so things have slowed down temporarily, and you'll find functional deltas with gleam at the moment (such as Sif doesn't support Joins or distributed sorting yet).

In a lot of ways, Sif and Gleam are similar as well. I think the goal of crafting tools like this in Go is a noble one and I'm happy to see others are thinking along similar lines!