Closed nataliaratnikova closed 8 years ago
A few examples for the T2_Test_Buffer node are included below. Small "artificial" example:
$VAR1 = {
'PHEDEX' => {
'NODES' => [
{
'TIMEBINS' => [
{
'LEVELS' => [
{
'DATA' => [
{
'SIZE' => 12,
'DIR' => '/store'
}
],
'LEVEL' => 1
},
{
'DATA' => [
{
'SIZE' => 2,
'DIR' => '/store/dir1'
},
{
'SIZE' => 6,
'DIR' => '/store'
},
{
'SIZE' => 2,
'DIR' => '/store/dir2'
},
{
'SIZE' => 2,
'DIR' => '/store/dir3'
}
],
'LEVEL' => 2
},
{
'DATA' => [
{
'SIZE' => 2,
'DIR' => '/store/dir1'
},
{
'SIZE' => 6,
'DIR' => '/store'
},
{
'SIZE' => 2,
'DIR' => '/store/dir2'
},
{
'SIZE' => 2,
'DIR' => '/store/dir3'
}
],
'LEVEL' => 3
},
{
'DATA' => [
{
'SIZE' => 2,
'DIR' => '/store/dir1'
},
{
'SIZE' => 6,
'DIR' => '/store'
},
{
'SIZE' => 2,
'DIR' => '/store/dir2'
},
{
'SIZE' => 2,
'DIR' => '/store/dir3'
}
],
'LEVEL' => 4
}
],
'TIMESTAMP' => '1437585025'
}
],
'SUBDIR' => '/',
'NODE' => 'T2_Test_Buffer'
}
],
'REQUEST_DATE' => '2015-07-24 17:01:47 UTC',
'REQUEST_CALL' => 'storageusage',
'REQUEST_URL' => 'http://cmsweb.cern.ch:8280/dmwmmon/datasvc/perl/storageusage',
'CALL_TIME' => '0.00655',
'REQUEST_VERSION' => '1.0.2-comp8',
'REQUEST_TIMESTAMP' => '1437757307.45115',
'INSTANCE' => 'read'
}
};
A larger inconsistent example:
$VAR1 = {
'PHEDEX' => {
'NODES' => [
{
'TIMEBINS' => [
{
'LEVELS' => [
{
'DATA' => [
{
'SIZE' => '3505857763410',
'DIR' => '/store'
},
{
'SIZE' => '10814697080',
'DIR' => '/hadoop'
}
],
'LEVEL' => 1
},
{
'DATA' => [
{
'SIZE' => '584309627235',
'DIR' => '/store'
},
{
'SIZE' => '2879273228190',
'DIR' => '/store/mc'
},
{
'SIZE' => '42274907985',
'DIR' => '/store/relval'
},
{
'SIZE' => 2162939416,
'DIR' => '/hadoop'
},
{
'SIZE' => 8651757664,
'DIR' => '/hadoop/cms'
}
],
'LEVEL' => 2
},
{
'DATA' => [
{
'SIZE' => '17707539684',
'DIR' => '/store/relval/CMSSW_4_2_3'
},
{
'SIZE' => '16112386704',
'DIR' => '/store/relval/CMSSW_4_2_0_pre4'
},
{
'SIZE' => '575854645638',
'DIR' => '/store/mc'
},
{
'SIZE' => 2162939416,
'DIR' => '/hadoop'
},
{
'SIZE' => 6488750664,
'DIR' => '/hadoop/cms/store'
},
{
'SIZE' => '584309627235',
'DIR' => '/store'
},
{
'SIZE' => 8454981597,
'DIR' => '/store/relval'
},
{
'SIZE' => '1806753045524',
'DIR' => '/store/mc/Winter09'
},
{
'SIZE' => '496665537028',
'DIR' => '/store/mc/Summer09'
},
{
'SIZE' => 67584,
'DIR' => '/hadoop/cms/phedex'
},
{
'SIZE' => 2162939416,
'DIR' => '/hadoop/cms'
}
],
'LEVEL' => 3
},
{
'DATA' => [
{
'SIZE' => '358306810467',
'DIR' => '/store/mc/Summer09/QCD_EMEnriched_Pt20to30'
},
{
'SIZE' => '1355064784143',
'DIR' => '/store/mc/Winter09/CosmicMC_BON_100GeV_AllCMS'
},
{
'SIZE' => '584309627235',
'DIR' => '/store'
},
{
'SIZE' => 4325833776,
'DIR' => '/hadoop/cms/store/group'
},
{
'SIZE' => '124166384257',
'DIR' => '/store/mc/Summer09'
},
{
'SIZE' => '13280654763',
'DIR' => '/store/relval/CMSSW_4_2_3/RelValMinBias'
},
{
'SIZE' => 4426884921,
'DIR' => '/store/relval/CMSSW_4_2_3'
},
{
'SIZE' => '12084290028',
'DIR' => '/store/relval/CMSSW_4_2_0_pre4/RelValMinBias'
},
{
'SIZE' => 4028096676,
'DIR' => '/store/relval/CMSSW_4_2_0_pre4'
},
{
'SIZE' => '575854645638',
'DIR' => '/store/mc'
},
{
'SIZE' => '14192342304',
'DIR' => '/store/mc/Summer09/Zmumu'
},
{
'SIZE' => 2162939416,
'DIR' => '/hadoop'
},
{
'SIZE' => 2162916888,
'DIR' => '/hadoop/cms/store'
},
{
'SIZE' => 45056,
'DIR' => '/hadoop/cms/phedex/store'
},
{
'SIZE' => 8454981597,
'DIR' => '/store/relval'
},
{
'SIZE' => '451688261381',
'DIR' => '/store/mc/Winter09'
},
{
'SIZE' => 2162939416,
'DIR' => '/hadoop/cms'
},
{
'SIZE' => 22528,
'DIR' => '/hadoop/cms/phedex'
}
],
'LEVEL' => 4
}
],
'TIMESTAMP' => '1329994112'
}
],
'SUBDIR' => '/',
'NODE' => 'T2_Test_Buffer'
}
],
'REQUEST_DATE' => '2015-07-24 17:01:24 UTC',
'REQUEST_CALL' => 'storageusage',
'REQUEST_URL' => 'http://cmsweb.cern.ch:8280/dmwmmon/datasvc/perl/storageusage',
'CALL_TIME' => '0.06904',
'REQUEST_VERSION' => '1.0.2-comp8',
'REQUEST_TIMESTAMP' => '1437757284.06879',
'INSTANCE' => 'read'
}
};
Unfortunately the real uploaded data are also affected. There are several issues that I think could have contributed: A) a bug in aggregation on the client side B) multiple uploads with the same timestamp partially overriding the data C) double count for the sub directories
Apart from debugging this particular issue, we may need to rethink the way we represent storage usage in this API, set clear conventions on what gets uploaded and how the modifications are handled. and document it properly.
Status update: A) a bug on the client side has been found and fixed in private cloned repo: https://github.com/nataliaratnikova/PHEDEX/commit/05f422832e33055127e9be29a64bfcd5b7b79c4a Need a push to dmwm repo to make available to the sites. Or this may have to wait until the sites migrate to the new client.
B) This will need some work to implement "record update" operation via deletion and addition. Since we do not have a static list of directories associated with the record. This operation must be done in one transaction to keep data consistent at all times.
C) It looks like the bug was introduced in SQLSpace.pm with this commit: https://github.com/dmwm/PHEDEX/commit/4833980c563df5d4e03ed59a374f62295123842f while fixing another problem.
Quick solution: added new GetLastRecord API to retrieve last uploaded record for a node. Added in DMWMMON-datasvc_1_0_4 release.
The commit in C) in my comment from Nov 17. did not introduce a new problem. It actually revealed an old one that went unnoticed for a long time. The StorageUsage algorithm was mixing in the data from different time stamps and adding wrong number at every new level, producing unpredictable results. This explains inconsistent numbers shown at different levels.
The algorithm is now completely rewritten and works as expected, keeping the same functionality and output data structure as before. Solved with this commit:
https://github.com/dmwm/PHEDEX/commit/c0bd2329f71c30569153e1b054ce4650f8e513f9
The problem that update of the existing record may leave behind old entries still needs to be solved within the StorageInsert API. We need to file another issue for that.
The directory size returned by storageusage API for level 1 is different from the value shown in other levels.