KovtunV / NoStringEvaluating

Fast low memory consuming mathematical evaluation without endless string parsing! Parses string formula once and uses its object sequence in each evaluation. Moreover, provides user defined functions and variables.
MIT License
28 stars 10 forks source link

How to calculate an expression for each row of a large data table without affecting performance #13

Closed prakashboston closed 9 months ago

prakashboston commented 10 months ago

I want to calculate an expression for each row of a data table and add a new column to the data table based on the result got.

KovtunV commented 10 months ago

To improve performance I suggest follow these recommendations:

  1. Use variables dictionary instead of concatenation IDictionary<string, EvaluatorValue> variables
  2. Tune your custom functions: dont allocate extra memory, etc
  3. If possible store FormulaNodes somewhere and pass to evaluator it instead of string formula.

This is a general recommendarions, for more specific I need more details, if possible - examples

prakashboston commented 10 months ago

In a data table, a formula column allows you to perform calculations based on values from other columns. Let's say you have a data table with columns "Quantity" and "Price," and you want to add a formula column for "Total" (Quantity multiplied by Price). The formula in the "Total" column multiplies the corresponding values in the "Quantity" and "Price" columns. How to perform this operation using our engine

KovtunV commented 10 months ago

If we work with numbers only so I can say these recommendations: 1) use single evaluator object 2) Use variables dictionary IDictionary<string, EvaluatorValue> variables 3) If possible to store FormulaNodes somewhere and pass to evaluator it instead of string formula

3 is a really optional you can go with defaults (string formula).

So design can be like: loop by rows from the row take two variables Quantity and Price send to formula store in a Total column

snippet:

    static void Main()
    {
        var ns = CreateNoString();

        // store a single instance (or one per thread)
        var variables = new Dictionary<string, EvaluatorValue>();

        // fetch data from the row
        var quantity = 14;
        var price = 123.675;

        // update dictionary
        variables["Quantity"] = quantity;
        variables["Price"] = price;

        // calculate total
        var total = ns.CalcNumber("Quantity * Price", variables); // 1731,45
    }
prakashboston commented 10 months ago

In a data table, a formula column allows you to perform calculations based on values from other columns. Let's say you have a data table with columns "Quantity" and "Price," and you want to add a formula column for "Total" (Quantity multiplied by Price). image In this example, the formula in the "Total" column multiplies the corresponding values in the "Quantity" and "Price" columns.

This I need to achieve without looping by rows

KovtunV commented 10 months ago

So you have variables like A2 B2, they should be in a dictionary. Also you have to store all cell formulas with linking to cells somewhere to understand what formula should be reruned.

For example formula: "A2*B2" I store this CellFormula After changing A2 or B2 you should say "Hey, I updated A2 has anybody listen to this cell?" our Total cell with formula (A2*B2) will take this notification

take values from A2 and B2 put into dictionary send to evaluator update Total cell value

OR, if this is too difficult just run for every formula and update all formulas after any changing ¯\_(ツ)_/¯

prakashboston commented 10 months ago

var ns = CreateNoString(); // Iterate through each row and calculate the expression foreach (DataRow row in dtInput.Rows) {

    // store a single instance (or one per thread)
    var variables = new Dictionary<string, EvaluatorValue>();

    // update dictionary
    variables["Quantity"] =  row["quantity "];
    variables["Price"] = row["price"];

    // calculate total
    var total = ns.CalcNumber("Quantity * Price", variables); 
    row["total"] =total ;

} I can able to perform using the above code but Is that possible any other way without looping through the data row

KovtunV commented 10 months ago

Sure, if I were you I would add Interface into the row IVariablesContainer (if you can extend DataRow) and use row as is


    // calculate total
    row["total"] = ns.CalcNumber("quantity * price", row);

without iterating you should implement some notification service, or maybe MediatR can be usefull here

prakashboston commented 10 months ago

Thanks for your response.MediatR is commonly used in the context of application development, it may not be directly applicable to working with DataTables. Data tables are more commonly associated with tabular data structures. Can I get a sample code for MediatR with DataTable?

KovtunV commented 10 months ago

I meant we can use it for notifications when some cell is changes we need to notify all cells with formula that uses it. Perhaps that's a bad solution, yes. But with notifications we don't need to run through the all rows.

I don't have a sample because I just imagine what can we do