dmwm / CRABServer

15 stars 38 forks source link

Integrating summer students's works into our monitoring pipeline. #7798

Open novicecpp opened 11 months ago

novicecpp commented 11 months ago

The main goal is integrate Ek-ong and Tan's works to crab as part of monitoring system and keep maintain it forever.

Resources:

The plan:

mapellidario commented 11 months ago

We continued our work on this topic, mainly some cleanup.

What we did:

What we still have to do.

Comments are welcome! Feel free to edit this comment as we progress with our work :)

mapellidario commented 9 months ago

We identified an issue with the data source that we use for raw condor metrics:

!hdfs dfs -ls /project/monitoring/archive/condor/raw/metric/2023/09/

drwxr-xr-x+  - monitops hdfs          0 2023-09-10 07:45 /project/monitoring/archive/condor/raw/metric/2023/09/01
drwxr-xr-x+  - monitops hdfs          0 2023-09-17 08:44 /project/monitoring/archive/condor/raw/metric/2023/09/02
drwxr-xr-x+  - monitops hdfs          0 2023-09-11 03:52 /project/monitoring/archive/condor/raw/metric/2023/09/03
drwxr-xr-x+  - monitops hdfs          0 2023-09-12 05:00 /project/monitoring/archive/condor/raw/metric/2023/09/04
drwxr-xr-x+  - monitops hdfs          0 2023-09-13 04:57 /project/monitoring/archive/condor/raw/metric/2023/09/05
drwxr-xr-x+  - monitops hdfs          0 2023-09-15 08:48 /project/monitoring/archive/condor/raw/metric/2023/09/06
drwxr-xr-x+  - monitops hdfs          0 2023-09-15 10:03 /project/monitoring/archive/condor/raw/metric/2023/09/07
drwxr-xr-x+  - monitops hdfs          0 2023-09-17 07:52 /project/monitoring/archive/condor/raw/metric/2023/09/08
drwxr-xr-x+  - monitops hdfs          0 2023-09-17 08:46 /project/monitoring/archive/condor/raw/metric/2023/09/09
drwxr-xr-x+  - monitops hdfs          0 2023-09-17 08:42 /project/monitoring/archive/condor/raw/metric/2023/09/10
drwxr-xr-x+  - monitops hdfs          0 2023-09-17 08:51 /project/monitoring/archive/condor/raw/metric/2023/09/11
drwxr-xr-x+  - monitops hdfs          0 2023-09-17 08:52 /project/monitoring/archive/condor/raw/metric/2023/09/12
drwxr-xr-x+  - monitops hdfs          0 2023-09-17 08:44 /project/monitoring/archive/condor/raw/metric/2023/09/13
drwxr-xr-x+  - monitops hdfs          0 2023-09-17 08:53 /project/monitoring/archive/condor/raw/metric/2023/09/14
drwxr-xr-x+  - monitops hdfs          0 2023-09-17 08:47 /project/monitoring/archive/condor/raw/metric/2023/09/15
drwxr-xr-x+  - monitops hdfs          0 2023-09-17 08:56 /project/monitoring/archive/condor/raw/metric/2023/09/16
drwxrwxr-x+  - monitops hdfs          0 2023-09-18 00:48 /project/monitoring/archive/condor/raw/metric/2023/09/17.tmp
drwxrwxr-x+  - monitops hdfs          0 2023-09-19 00:01 /project/monitoring/archive/condor/raw/metric/2023/09/18.tmp
drwxrwxr-x+  - monitops hdfs          0 2023-09-20 00:53 /project/monitoring/archive/condor/raw/metric/2023/09/19.tmp
drwxrwxr-x+  - monitops hdfs          0 2023-09-21 00:36 /project/monitoring/archive/condor/raw/metric/2023/09/20.tmp
drwxrwxr-x+  - monitops hdfs          0 2023-09-22 00:00 /project/monitoring/archive/condor/raw/metric/2023/09/21.tmp
drwxrwxr-x+  - monitops hdfs          0 2023-09-22 15:00 /project/monitoring/archive/condor/raw/metric/2023/09/22.tmp

We informed Nikodemas from CMS Monit team and we will follow up.

mapellidario commented 9 months ago

Moreover, we noticed that the data about rucio taperecall from cmsspark [1] does not match with rucio monitoring [2]

we will investigate

[1]

image

[2]

image

mapellidario commented 5 months ago

Request from Stefano, on mattermost private channel:

novicecpp commented 4 months ago

image

@nikodemas kindly ping me today that our pipeline has been broken for a long time. I proposed to him to have a chat in the first week of April (including usage of cms1 instance), after Easter. Is it ok for you @mapellidario ?