online-talk-proposal: Accelerate your pandas workload using FireDucks at zero manual effort

qsourav commented 3 months ago

Title of the talk/workshop Accelerate your pandas workload using FireDucks at zero manual effort

Abstract of the talk/workshop In general, a Data Scientist spends significant efforts in transforming the raw data into a more digestible format before training an AI model or creating visualizations. Traditional tools such as pandas have long been the linchpin in this process, offering powerful capabilities but not without limitations. With numerous possible ways to write the same thing in pandas, often a user ends up selecting the uneconomical, inefficient ones, leading to large computational costs with the growth in data size. We introduce a couple of frequently occurring intricate performance issues in pandas, along with a compiler-accelerated high-performance library named FireDucks to auto-detect and optimize those issues without any manual effort. We will also demonstrate how FireDucks can outperform the existing high-performance pandas alternatives.

The growth of data sizes and the increase in performance cost create the demand for high-performance DataFrame libraries. However, the existing pandas alternatives often compel a user to learn completely new APIs (incurring migration cost), whereas some of the others demand a more efficient computational system (incurring hardware cost). To address the same, we at NEC R&D Lab Japan, have developed FireDucks, a solution that’s been crafted for the contemporary data professional who loves flexible user APIs in pandas and wants to enhance the performance of their application without any extra hardware cost when dealing with voluminous and complex data on a regular basis. It is released on pypi.org under the 3-Clause BSD License and can be simply installed using pip.

Category of the talk/workshop Data Science, Machine Learning, and AI

Duration (including Q&A) talk (30 mins) + QA (10 mins)

Here is the detailed outline:

Current challenges of large-scale data analysis using pandas and other libraries. (5 mins)
Introduction to FireDucks (developed at NEC R&D Lab) and its offerings. (10 mins)
Demo on automatic acceleration of pandas workload using FireDucks. (5 mins)
Tricks and Tips on writing better code related to large-scale data analysis. (5 mins)
Performance comparison of FireDucks and other data analysis Python libraries. (5 mins)
Q/A (10 mins)

Level of Audience Beginner to Intermediate: This session is helpful for anyone with basic knowledge of pandas or a similar data manipulation library in Python.

Speaker Bio Sourav has 11+ years of professional experience at NEC Corporation in the diverse fields of High-Performance Computing, Distributed Programming, Compiler Design, and Data Science. Currently, his team at NEC R&D Lab, Japan, is researching various data processing-related algorithms. Blending the mixture of different niche technologies related to compiler framework, high-performance computing, and multi-threaded programming, they have developed a Python library named FireDucks with highly compatible pandas APIs for DataFrame-related operations. In his previous engagements, he has worked in research and development of performance-critical AI and Big Data solutions, optimization of several legacy applications related to weather prediction, earth-quake simulation, etc., written in C++ and Fortran.

Company: NEC Corporation (Japan) Email: sourav-saha@nec.com Years of Exp: 11+

Prerequisites(if any) Basic understanding of pandas or a similar data manipulation library in Python.

qsourav commented 3 months ago

Notes to the organizer If accepted, I will be able to present my talk at one of your online/hybrid events (since I am located in Tokyo, Japan).

kalyan678 commented 3 months ago

Notes to the organizer If accepted, I will be able to present my talk at one of your online/hybrid events (since I am located in Tokyo, Japan).

@qsourav - Thanks for submitting the proposal. At the moment we are not doing any online meetups but will definitely reach out to you incase if we plan to do one.

qsourav commented 3 months ago

@kalyan678 , thank you for your kind response. I understood the same.

bpkapkar commented 1 month ago

Hello @qsourav ,

Thank you for submitting your talk proposal. Since we're currently focusing on in-person meetups, we are closing this request for now. If we plan any online events in the future, we will reach out to you. Alternatively, if you're in Hyderabad and interested in presenting in person, please let us know, and we would be happy to consider it.

Thank you for your understanding, and we look forward to staying in touch!

HydPy / HydPy-meetups

online-talk-proposal: Accelerate your pandas workload using FireDucks at zero manual effort #80