daphne-eu / daphne

DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines
Apache License 2.0
67 stars 62 forks source link

[DAPHNE-#629] efficient processing StringData in DenseMatrix #797

Closed saminbassiri closed 1 month ago

saminbassiri commented 3 months ago

[DAPHNE-#629] Efficient Processing of String Data Sets in DAPHNE with FixedStr16 Class and std::string Class for DenseMatrix

Summary

This PR addresses issue #629 by enhancing the string support in DAPHNE, making it practical to process string data sets. The main addition is Generilizeing or specializing current template structures for FixedStr16 class and std::string class. While significant progress has been made, additional features related to element-wise comparisons will be added in the upcoming days.

Key Features Implemented

Testing

Upcoming Features

The following features will be added in the next few days:

saminbassiri commented 3 months ago

Upcoming Tasks Completed

The additional features mentioned before have been successfully implemented.

Changes:

The following features, as outlined in the PR message, have now been added:

Test

Initial tests for DenseMatrix<std::string> and DenseMatrix<FixedStr16> were implemented, verifying functionality for newly added features and data types.

saminbassiri commented 2 months ago

Thank you for the thorough review and detailed feedback, @pdamme. I have considered the points you raised, and here is a summary of the changes:

saminbassiri commented 1 month ago

PR Update:

I have applied several changes related to this PR:

  1. Handling Unsupported Result Types During String Casting:

    • This change improves the robustness of the code by adding proper handling of unsupported result types during string casting. If an unsupported type is used, a compile-time warning is issued using the C++ [[deprecated]] attribute, and a runtime error will be thrown if the CastSca function is called with an invalid result type. This ensures safer and more predictable behavior.
  2. FixedStr16 Buffer Size Update:

    • The FixedStr16 constructor has been updated to support 16-character strings without requiring a null terminator. Additionally, I have updated the test cases in CastObjTest.cpp to reflect this change.