ankane / ruby-polars

Blazingly fast DataFrames for Ruby
MIT License
852 stars 33 forks source link

Feature Request: Gracefully Handle Non-Existent Columns in DataFrame [] Method #65

Closed mezbahalam closed 1 month ago

mezbahalam commented 6 months ago

When attempting to access a non-existent column in a Polars::DataFrame using the [] method, the library currently throws a RuntimeError. This behavior can disrupt application flow and requires users to implement additional error handling to manage such cases. It would be beneficial if the [] method could handle this scenario more gracefully, perhaps by returning nil or providing a more informative error message, thereby allowing the application to continue running smoothly.

Current Behavior: Accessing a non-existent column results in a runtime error. Here's an example of the error output when trying to access a column called "missing_column" in a DataFrame:

df = Polars::DataFrame.new({"foo" => [1, 2, 3], "bar" => [6, 7, 8]})
df["missing_column"]
# Output: RuntimeError: not found: missing_column

Suggested Improvement: Modify the behavior of the [] method so that it returns nil or logs a warning when a non-existent column is accessed. This change would prevent the method from raising a runtime exception and improve the robustness of applications using this library.

Benefits: Improved resilience: applications will be more robust by gracefully handling missing data without crashing. Better user experience: users will face fewer interruptions due to unhandled exceptions, especially in scenarios involving dynamic data access. Simplified error handling: reduces the need for extensive error handling around column access, making code cleaner and easier to maintain.

Use Case: This feature would be particularly useful in data analysis scenarios where scripts dynamically access columns from datasets that may vary in structure, enabling smoother operation and easier debugging.

topofocus commented 6 months ago

If an application needs dynamic data access, its very easy to implement such an Error-Handling. If – on the other hand – an application builds on restrict data structures, one has to actively implement error-detection mechanisms. That would be unfortunate.

ankane commented 1 month ago

Hi @mezbahalam, thanks for the suggestion, and sorry for the delay. I think returning nil like hashes do would be more Ruby-like (this is what my other data frame library does). However, I'd like to keep this consistent with the Python library for now.