Open cboettig opened 8 months ago
That does sound intriguing and promising! I don't know polars and ibis well enough to judge whether this would be an improvement, or have the resources to take on this major shift, but it does sound like the sort of change that might be a big improvement. It would be great to be able to move to run on existing standard libraries rather than relying on the datascience package, if the existing libraries are easy enough to learn and meet the pedagogical goals.
@davidwagner very cool! @jegonzal and @fperez were discussing this a bit in the context of data-100 too and may have more insight. From what I understand, it sounds like Wes created ibis to address these issues they had in pandas in the first place [1].
Yes! I haven't had time to dig into the details of polars vs ibis, and I'm not even sure if they occupy quite the same space. But polars is definitely rapidly rising as a viable alternative to pandas, and I think we'd gain a ton from exploring this.
I also think that a combination of one/two GSIs + AI-assisted translation could make the porting of at least the base material a reasonable lift, with the faculty/textbook authors having to only do a final review of the resulting product.
It's not trivial, but it could be done in parallel over a semester if DSUS assigns one or two GSIs to the job.
For lots of good reasons data-8 has, as I understand, always relied on Berkeley's own datascience package which offers far more intuitive/pythonic syntax to
pandas
. While I agree with all the pedagogical justification there relative to pandas, as you all probably know there are now much more performant and pythonic alternatives displacing pandas dominance, specificallypolars
andibis
.I think these provide a syntax that is closer to
datascience
than pandas, and is more nicely aligned with and informed by database theory (and indeed can be translated directly to SQL). I know this wouldn't be a small overhaul, but I think it could be a substantial improvement.Maybe it would make more sense to migrate data100 from pandas to polars first?