Athari / YaLinqo

Yet Another LINQ to Objects for PHP [Simplified BSD]
https://athari.github.io/YaLinqo
BSD 2-Clause "Simplified" License
439 stars 39 forks source link

Joining multiple arrays #17

Closed anvarulugov closed 8 years ago

anvarulugov commented 8 years ago

Hello thank you for such awesome script! I want to join 4 arrays. But honestly I coudn't undersand the documentation. Right now I have an sql query:

$ZoneID = '1';
$TillageID = '2';
$rs = mySQLquery("SELECT
        CR.CropID AS CropID,
        CP.OperID AS OperID,
        CP.ParentOperID AS ParentOperID,
        A.ActionID AS ActionID,
        A.WorkloadUnit AS WorkloadUnit,
        A.WorkpayUnit AS WorkpayUnit,
        AT.ActionTypeID AS ActionTypeID,
        A.Labor_prod AS LaborProd,
        A.Labor_rate AS LaborRate,
        A.Mech_rate AS ServiceRate,
        AT.Category AS WorkCategory,
        CP.Feature AS Feature,
        CP.StartYear AS StartYear,
        CP.StartMonth AS StartMonth,
        CP.StartDay AS StartDay,
        CP.EndYear AS EndYear,
        CP.EndMonth AS EndMonth,
        CP.EndDay AS EndDay
        FROM
        ((mlp_cropplan CP
        INNER JOIN mlp_crops CR ON CP.CropID = CR.CropID)
        INNER JOIN mlp_actions A ON CP.ActionID = A.ActionID)
        INNER JOIN mlp_actiontypes AT ON A.ActionTypeID = AT.ActionTypeID
        WHERE (CP.TillageID = 0 OR CP.TillageID = " . $TillageID . ")
        AND (CP.ZoneID = 0 OR CP.ZoneID = " . $ZoneID . ")          
        ORDER BY CR.CropID ASC, CP.OperID ASC");

But I want to use yalinqo instead of mysql. I dumped all the date from these table and made php arrays. Here is the code:

$rs2 = from( $mlp_cropplan )
        ->where(function( $mlp_cropplan ) { 
            global $TillageID, $ZoneID;
            return ( ( $mlp_cropplan['TillageID'] == 0 || $mlp_cropplan['TillageID'] == $TillageID ) && ( $mlp_cropplan['ZoneID'] == 0 || $mlp_cropplan['ZoneID'] == $ZoneID ) ); 
        } )
        ->orderBy( function( $mlp_cropplan ) { return $mlp_cropplan['CropID']; } )
        ->thenBy( function( $mlp_cropplan ) { return $mlp_cropplan['CropID']; } )
        ->join( from( $mlp_crops ), 
            function( $mlp_cropplan ) { return $mlp_cropplan['CropID']; }, 
            function( $mlp_crops ) { return $mlp_crops['CropID']; }, 
            function( $mlp_cropplan, $mlp_crops ) {
                return array(
                    'CropID' => $mlp_cropplan['CropID'],
                    'OperID' => $mlp_cropplan['OperID'],
                    'ParentOperID' => $mlp_cropplan['ParentOperID'],
                    'Feature' => $mlp_cropplan['Feature'],
                    'StartYear' => $mlp_cropplan['StartYear'],
                    'StartMonth' => $mlp_cropplan['StartMonth'],
                    'StartDay' => $mlp_cropplan['StartDay'],
                    'EndYear' => $mlp_cropplan['EndYear'],
                    'EndMonth' => $mlp_cropplan['EndMonth'],
                    'EndDay' => $mlp_cropplan['EndDay'],
                    'ActionID' => $mlp_cropplan['ActionID'],
                );
            } )
        ->join( from( $mlp_actions ), 
            function( $mlp_cropplan ) { return $mlp_cropplan['ActionID']; }, 
            function( $mlp_actions ) { return $mlp_actions['ActionID']; }, 
            function( $mlp_cropplan, $mlp_actions ) {
                return array(
                    'CropID' => $mlp_cropplan['CropID'],
                    'OperID' => $mlp_cropplan['OperID'],
                    'ParentOperID' => $mlp_cropplan['ParentOperID'],
                    'ActionID' => $mlp_cropplan['ActionID'],
                    'WorkloadUnit' => $mlp_actions['WorkloadUnit'],
                    'WorkpayUnit' => $mlp_actions['WorkpayUnit'],
                    'LaborProd' => $mlp_actions['Labor_prod'],
                    'LaborRate' => $mlp_actions['Labor_rate'],
                    'ServiceRate' => $mlp_actions['Mech_rate'],
                    'Feature' => $mlp_cropplan['Feature'],
                    'StartYear' => $mlp_cropplan['StartYear'],
                    'StartMonth' => $mlp_cropplan['StartMonth'],
                    'StartDay' => $mlp_cropplan['StartDay'],
                    'EndYear' => $mlp_cropplan['EndYear'],
                    'EndMonth' => $mlp_cropplan['EndMonth'],
                    'EndDay' => $mlp_cropplan['EndDay'],
                    'ActionTypeID' => $mlp_actions['ActionTypeID'],  
                );
            } )
        ->join( from( $mlp_actiontypes ), 
            function( $mlp_cropplan ) { return $mlp_cropplan['ActionTypeID']; }, 
            function( $mlp_actiontypes ) { return $mlp_actiontypes['ActionTypeID']; }, 
            function( $mlp_cropplan, $mlp_actiontypes ) {
                return array(
                    'CropID' => $mlp_cropplan['CropID'],
                    'OperID' => $mlp_cropplan['OperID'],
                    'ParentOperID' => $mlp_cropplan['ParentOperID'],
                    'ActionID' => $mlp_cropplan['ActionID'],
                    'WorkloadUnit' => $mlp_cropplan['WorkloadUnit'],
                    'WorkpayUnit' => $mlp_cropplan['WorkpayUnit'],
                    'ActionTypeID' => $mlp_actiontypes['ActionTypeID'],
                    'LaborProd' => $mlp_cropplan['Labor_prod'],
                    'LaborRate' => $mlp_cropplan['Labor_rate'],
                    'ServiceRate' => $mlp_cropplan['Mech_rate'],
                    'WorkCategory' => $mlp_actiontypes['Category'],
                    'Feature' => $mlp_cropplan['Feature'],
                    'StartYear' => $mlp_cropplan['StartYear'],
                    'StartMonth' => $mlp_cropplan['StartMonth'],
                    'StartDay' => $mlp_cropplan['StartDay'],
                    'EndYear' => $mlp_cropplan['EndYear'],
                    'EndMonth' => $mlp_cropplan['EndMonth'],
                    'EndDay' => $mlp_cropplan['EndDay'],
                );
            } );

The code is working, bu after second join the data of the first join is disappearing, please help me :-(

Athari commented 8 years ago

If amount of data you're working with isn't small, I'd advise to use RDBMS like MySQL. YaLinqo is supposed to be used when a database isn't avalaible or adequate, not to replace databases. You'll just lose performance, write more code and gain nothing in return.

The problem with joining is that functional syntax of LINQ supports joining only two arrays. In order to join more arrays, you'll need to create temporary arrays or objects between joins which is quite verbose and, well, annoying. In .NET, query syntax can be used, which performs all these transformations for you, but it isn't available in PHP.

You also don't need to wrap arrays in "from" if you don't perform any operations on them, YaLinqo accepts arrays as arguments in all methods.


If you don't understand how some method works, you can just look at the code, in most cases it's pretty simple. Let's look at join (it's actually one of the most complex methods). Without boilerplate, its code looks like this:

$lookup = $inner->toLookup($innerKeySelector);
foreach ($this as $ok => $ov) {
    $key = $outerKeySelector($ov, $ok);
    if (isset($lookup[$key]))
        foreach ($lookup[$key] as $iv)
            yield $resultSelectorKey($ov, $iv, $key) => $resultSelectorValue($ov, $iv, $key);
}

So, it creates a lookup (array matching keys to many values), then iterates over outer sequence matching keys.

One bonus of PHP is that merging arrays is simple. So, if you have two arrays with indexes which are either unique or equal, you can use array_merge to join the results. If some keys are common, one more array can be added as an argument to "override" collapsed keys. So the general pattern is this:

$join = from($as)
    ->join($bs,
        '$v["keya"]', '$v["keya"]',
        function ($a, $b) { return array_merge($a, $b); })
    ->join($cs,
        '$v["keyb"]', '$v["keyb"]',
        function ($ab, $c) { return array_merge($ab, $c); })
    ->join($ds,
        '$v["keyc"]', '$v["keyc"]',
        function ($abc, $d) { return array_merge($abc, $d, [
            'v1' => $abc['v1'],
            'v2' => $d['v2'],
        ]); });

One more reason of losing values is calling toArray instead of toList, as multiple join method calls can result in equal keys.

P. S. Not sure whether this is what you're doing — there's too much code. I hope my advice can simplify your code considerably.

anvarulugov commented 8 years ago

Thank you @Athari . The response was quite fast! I have a loop that cycles about 1000 times. And each time it makes request to database. I was trying to optimze it. I think I went by wrong way :-(

anvarulugov commented 8 years ago

Thank you @Athari . The response was quite fast! I separated joines, and changed toArray to toList, now everything is ok! Thank you!

Athari commented 8 years ago

@anvarulugov Getting rid of RDBMS may result in performance improvements if you're working with a high-performance language like C++ and you know how to perform your queries optimally on a low level. Reimplementing RDBMS in PHP will never make your program faster, and even if it does, there's likely something which can be done on RDBMS level which can do much better, like adding indexes and giving hints to optimizer.

In your case, the obvious solution is to not perform the same query 1000 times. If all you're chaning is two arguments in WHERE statement, just query all data you need with one query, then perform filtering in your program. This is where YaLinqo can actually be useful.

Unless you have enormous volume of data, one query with filtering on app level will be faster than thousand queries — you'll lose more time on inter-process communication than on actual queries.

anvarulugov commented 8 years ago

Yes, you are right, that's why I got all the data in to arrays, from 4 tables into 4 arrays, and used YaLinqo. @Athari, you are genius. I replaced my code with yours now the code looks like this:

$rs = from( $mlp_cropplan )
        ->where(function( $mlp_cropplan ) { 
            global $TillageID, $ZoneID;
            return ( ( $mlp_cropplan['TillageID'] == 0 || $mlp_cropplan['TillageID'] == $TillageID ) && ( $mlp_cropplan['ZoneID'] == 0 || $mlp_cropplan['ZoneID'] == $ZoneID ) ); 
        } )
        ->orderBy( function( $mlp_cropplan ) { return $mlp_cropplan['CropID']; } )
        ->thenBy( function( $mlp_cropplan ) { return $mlp_cropplan['CropID']; } )
        ->join( $mlp_actions, 
            '$v["ActionID"]', '$v["ActionID"]',
            function ($a, $b) { return array_merge($a, $b); })
        ->join( $mlp_actiontypes, 
            '$v["ActionTypeID"]', '$v["ActionTypeID"]',
            function ($a, $b) { return array(
                'CropID' => $a['CropID'],
                'OperID' => $a['OperID'],
                'ParentOperID' => $a['ParentOperID'],
                'ActionID' => $a['ActionID'],
                'WorkloadUnit' => $a['WorkloadUnit'],
                'WorkpayUnit' => $a['WorkpayUnit'],
                'ActionTypeID' => $b['ActionTypeID'],
                'LaborProd' => $a['Labor_prod'],
                'LaborRate' => $a['Labor_rate'],
                'ServiceRate' => $a['Mech_rate'],
                'WorkCategory' => $b['Category'],
                'Feature' => $a['Feature'],
                'StartYear' => $a['StartYear'],
                'StartMonth' => $a['StartMonth'],
                'StartDay' => $a['StartDay'],
                'EndYear' => $a['EndYear'],
                'EndMonth' => $a['EndMonth'],
                'EndDay' => $a['EndDay'],
            ); });

$mlp_cropplan = $rs2->toList();
print_r($mlp_cropplan);