Open novicecpp opened 11 months ago
We continued our work on this topic, mainly some cleanup.
What we did:
What we still have to do.
cron_daily.sh
and run_spark.sh
, the scripts need to be executableekongat/crab-datapipeline
to dmwm/CRABServer/src/script/Monitor/crab-spark
. Ek-Ong will need to open a PR to dmwm/CRABServer/var/log/crab/crabspark.log
Comments are welcome! Feel free to edit this comment as we progress with our work :)
We identified an issue with the data source that we use for raw condor metrics:
!hdfs dfs -ls /project/monitoring/archive/condor/raw/metric/2023/09/
drwxr-xr-x+ - monitops hdfs 0 2023-09-10 07:45 /project/monitoring/archive/condor/raw/metric/2023/09/01
drwxr-xr-x+ - monitops hdfs 0 2023-09-17 08:44 /project/monitoring/archive/condor/raw/metric/2023/09/02
drwxr-xr-x+ - monitops hdfs 0 2023-09-11 03:52 /project/monitoring/archive/condor/raw/metric/2023/09/03
drwxr-xr-x+ - monitops hdfs 0 2023-09-12 05:00 /project/monitoring/archive/condor/raw/metric/2023/09/04
drwxr-xr-x+ - monitops hdfs 0 2023-09-13 04:57 /project/monitoring/archive/condor/raw/metric/2023/09/05
drwxr-xr-x+ - monitops hdfs 0 2023-09-15 08:48 /project/monitoring/archive/condor/raw/metric/2023/09/06
drwxr-xr-x+ - monitops hdfs 0 2023-09-15 10:03 /project/monitoring/archive/condor/raw/metric/2023/09/07
drwxr-xr-x+ - monitops hdfs 0 2023-09-17 07:52 /project/monitoring/archive/condor/raw/metric/2023/09/08
drwxr-xr-x+ - monitops hdfs 0 2023-09-17 08:46 /project/monitoring/archive/condor/raw/metric/2023/09/09
drwxr-xr-x+ - monitops hdfs 0 2023-09-17 08:42 /project/monitoring/archive/condor/raw/metric/2023/09/10
drwxr-xr-x+ - monitops hdfs 0 2023-09-17 08:51 /project/monitoring/archive/condor/raw/metric/2023/09/11
drwxr-xr-x+ - monitops hdfs 0 2023-09-17 08:52 /project/monitoring/archive/condor/raw/metric/2023/09/12
drwxr-xr-x+ - monitops hdfs 0 2023-09-17 08:44 /project/monitoring/archive/condor/raw/metric/2023/09/13
drwxr-xr-x+ - monitops hdfs 0 2023-09-17 08:53 /project/monitoring/archive/condor/raw/metric/2023/09/14
drwxr-xr-x+ - monitops hdfs 0 2023-09-17 08:47 /project/monitoring/archive/condor/raw/metric/2023/09/15
drwxr-xr-x+ - monitops hdfs 0 2023-09-17 08:56 /project/monitoring/archive/condor/raw/metric/2023/09/16
drwxrwxr-x+ - monitops hdfs 0 2023-09-18 00:48 /project/monitoring/archive/condor/raw/metric/2023/09/17.tmp
drwxrwxr-x+ - monitops hdfs 0 2023-09-19 00:01 /project/monitoring/archive/condor/raw/metric/2023/09/18.tmp
drwxrwxr-x+ - monitops hdfs 0 2023-09-20 00:53 /project/monitoring/archive/condor/raw/metric/2023/09/19.tmp
drwxrwxr-x+ - monitops hdfs 0 2023-09-21 00:36 /project/monitoring/archive/condor/raw/metric/2023/09/20.tmp
drwxrwxr-x+ - monitops hdfs 0 2023-09-22 00:00 /project/monitoring/archive/condor/raw/metric/2023/09/21.tmp
drwxrwxr-x+ - monitops hdfs 0 2023-09-22 15:00 /project/monitoring/archive/condor/raw/metric/2023/09/22.tmp
We informed Nikodemas from CMS Monit team and we will follow up.
Moreover, we noticed that the data about rucio taperecall from cmsspark [1] does not match with rucio monitoring [2]
we will investigate
[1]
[2]
Request from Stefano, on mattermost private channel:
@nikodemas kindly ping me today that our pipeline has been broken for a long time.
I proposed to him to have a chat in the first week of April (including usage of cms1
instance), after Easter.
Is it ok for you @mapellidario ?
The main goal is integrate Ek-ong and Tan's works to crab as part of monitoring system and keep maintain it forever.
Resources:
The plan: