dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
288 stars 136 forks source link

bulk: `request ls` broken when command issued on stdin of ssh #7595

Open onnozweers opened 3 months ago

onnozweers commented 3 months ago

Dear dCache devs,

The output of a request ls appears to be broken. In 9.2.18, I see:

[root@dcmain /var/log]# dcache-admin-command bulk 'request ls -l'
org.dcache.services.bulk.BulkServiceCommands$PagedRequestResult@2c59d782

When I tried a long time ago (December 2022), it looked like this:

[root@hedgehog14 /var/log]# dcache-admin-command bulk 'request ls'
SEQNO        | ARRIVED             |            MODIFIED |        OWNER |     STATUS | ID
2            | 2022/12/07-12:12:41 | 2022/12/07-12:12:41 |  99999:99999 |  COMPLETED | 0556595c-8418-4a23-a3ca-6b5aec975aa7
4            | 2022/12/07-12:16:02 | 2022/12/07-12:16:02 |  99999:99999 |  COMPLETED | 1382ccad-cb02-46bf-a632-326a85190a79
5            | 2022/12/07-12:19:33 | 2022/12/07-12:19:33 |  99999:99999 |  COMPLETED | 1ed31251-0eee-4527-b414-a07369d59880
6            | 2022/12/07-12:19:51 | 2022/12/07-12:19:51 |  99999:99999 |  COMPLETED | 05e77d0f-a5fc-401d-a34f-ccfcd8b9061e

So this command seems to be broken.

Cheers, Onno

DmitryLitvintsev commented 3 months ago

Hi Onno,

Certainly works in 9.2.9. Do you see anything useful runnung show pinboard in bulk?

Also what happens if you run say:

request ls -limit=10 
onnozweers commented 3 months ago

Hi Dmitry,

The -limit does not help. Also, we have two bulk cells in HA, and both have this problem. It does not make a difference which is the leader.

[root@install-fes ~]# dcache-admin-command bulk2 'request ls -limit=10'
org.dcache.services.bulk.BulkServiceCommands$PagedRequestResult@4875c386

The pinboard does not show anything useful

[root@install-fes ~]# dcache-admin-command bulk2 'show pinboard'
12 Jun 2024 22:17:31 [pool-6-thread-8] [] Stopping the job manager.
12 Jun 2024 22:17:31 [pool-20-thread-1] [] interrupted.
12 Jun 2024 22:17:31 [pool-20-thread-1] [] ConcurrentRequestProcessor exiting...
12 Jun 2024 22:17:49 [System-0-EventThread] [] HA: Assuming leader role.
12 Jun 2024 22:17:49 [System-0-EventThread] [] isLeader called
12 Jun 2024 22:17:49 [pool-6-thread-8] [] Loading requests into the request store/queue; incomplete requests will be reset to QUEUED.
12 Jun 2024 22:17:49 [pool-6-thread-8] [] Initializing the job manager.
12 Jun 2024 22:17:49 [pool-6-thread-8] [] Signalling the job manager.
12 Jun 2024 22:17:49 [pool-6-thread-8] [] Service startup completed.
12 Jun 2024 22:19:26 [Curator-ConnectionStateManager-0] [] HA: Dropping leader role.
12 Jun 2024 22:19:26 [Curator-ConnectionStateManager-0] [] notLeader called
12 Jun 2024 22:19:26 [pool-6-thread-9] [] Stopping the job manager.
12 Jun 2024 22:19:26 [pool-21-thread-1] [] interrupted.
12 Jun 2024 22:19:26 [pool-21-thread-1] [] ConcurrentRequestProcessor exiting...
12 Jun 2024 22:31:50 [System-0-EventThread] [] HA: Assuming leader role.
12 Jun 2024 22:31:50 [System-0-EventThread] [] isLeader called
12 Jun 2024 22:31:50 [pool-6-thread-10] [] Loading requests into the request store/queue; incomplete requests will be reset to QUEUED.
12 Jun 2024 22:31:50 [pool-6-thread-10] [] Initializing the job manager.
12 Jun 2024 22:31:50 [pool-6-thread-10] [] Signalling the job manager.
12 Jun 2024 22:31:50 [pool-6-thread-10] [] Service startup completed.

I tried also on our test system and there we have the same problem. The version:

[root@hedgehog14 ~/dcache]# ls -l packages/fhs/target/rpmbuild/RPMS/noarch/dcache*.rpm
-rw-r--r-- 1 root root 131907060 Dec  5  2023 packages/fhs/target/rpmbuild/RPMS/noarch/dcache-10.0.0.96893bc-1.noarch.rpm

I'll upgrade it to the latest master snapshot and try again.

onnozweers commented 3 months ago

Now that we fixed our bulk problem, I tried again, and the request ls is still broken.

Also in the latest master snapshot (10.1.0.d772420) on our test server the same problem.

DmitryLitvintsev commented 3 months ago

You are saying that with singe bulk cell request ls is not woking for you?Jiust making sure.

DmitryLitvintsev commented 3 months ago

Just a data point, head of the trunk:

$ ssh -p 24223 admin@fndcatemp2
Warning: Permanently added '[fndcatemp2]:24223,[131.225.240.93]:24223' (RSA) to the list of known hosts.
dCache (10.1.0-SNAPSHOT)
Type "\?" for help.

[fndcatemp2] (local) admin > \c bulk 
[fndcatemp2] (bulk@bulk1Domain) admin > request ls
ID           | ARRIVED             |            MODIFIED |        OWNER |     STATUS | UID
1700         | 2024/06/20-16:42:36 | 2024/06/20-16:42:37 |    8637:3200 |  COMPLETED | 70e09eb1-0327-4f5c-9d1f-166cf460b616
1701         | 2024/06/20-16:43:28 | 2024/06/20-16:43:29 |    8637:3200 |  COMPLETED | 571754c4-496a-434c-806c-a0f4d3a8c495
1702         | 2024/06/20-16:51:09 | 2024/06/20-16:51:10 |    8637:3200 |  COMPLETED | 14558ba5-4462-42e3-8246-56044eb29dd9
1703         | 2024/06/20-16:52:16 | 2024/06/20-16:52:17 |    8637:3200 |  COMPLETED | 955d468a-5d8a-416e-a321-95036bae9728
1704         | 2024/06/20-16:53:12 | 2024/06/20-16:53:13 |    8637:3200 |  COMPLETED | 8e87663d-d4b5-4905-9aea-2d845118d799
1705         | 2024/06/20-16:54:37 | 2024/06/20-16:54:38 |    8637:3200 |  COMPLETED | 0785c60f-6991-470c-a3d9-ac21526e20cd
1706         | 2024/06/20-17:07:23 | 2024/06/20-17:07:23 |    8637:3200 |  COMPLETED | d068ff33-9a45-4eea-b546-de881749952a
1707         | 2024/06/20-18:20:07 | 2024/06/20-18:20:07 |    8637:3200 |  COMPLETED | 9a82c2bb-dda2-4f18-8c8d-8892f5781304
1708         | 2024/06/20-18:57:05 | 2024/06/20-18:57:05 |    8637:3200 |  COMPLETED | 6a4c94fc-a5ce-4b2a-b72f-cebcd39afee5
1709         | 2024/06/20-18:58:27 | 2024/06/20-18:58:27 |    8637:3200 |  COMPLETED | 5dbddc94-a090-43e6-9a26-3384199a0f83
1710         | 2024/06/20-18:59:14 | 2024/06/20-18:59:14 |    8637:3200 |  COMPLETED | da9a05fa-7932-4b79-a2df-36d6f973f36c
1711         | 2024/06/20-19:00:39 | 2024/06/20-19:00:39 |    8637:3200 |  COMPLETED | c56a45e6-3f39-4670-b86e-c5d67ef6741a
1712         | 2024/06/20-19:16:14 | 2024/06/20-19:16:46 |    8637:3200 |  COMPLETED | f8c9b8c2-eda5-4a45-a63b-d76003ad265a
1713         | 2024/06/20-19:19:05 | 2024/06/20-19:19:05 |    8637:3200 |  COMPLETED | 3ca532f3-f9cf-4566-a3da-fde910364379
1714         | 2024/06/20-19:22:19 | 2024/06/20-19:22:19 |    8637:3200 |  COMPLETED | eb3a58b0-e16a-4f16-994e-c916eb154a83
[fndcatemp2] (bulk@bulk1Domain) admin > 

I am not saying proverbial "works for me" but as you can see at least it works

Again, do you see anything interesting in pinboard or log file? Could you

log set stdout DEBUG

and redo the command and see if there is anything interesting in the log?

onnozweers commented 3 months ago

Ahhhh the problem occurs when feeding the admin commands on stdin of ssh, instead of as ssh argument.

[root@hedgehog14 ~]# echo "\s bulk request ls" | ssh -T -l admin -p "22224" "dcachetest.grid.surfsara.nl"
org.dcache.services.bulk.BulkServiceCommands$PagedRequestResult@7350f58e
[root@hedgehog14 ~]# echo "\s bulk request ls" | ssh -l admin -p "22224" "dcachetest.grid.surfsara.nl"
Pseudo-terminal will not be allocated because stdin is not a terminal.
org.dcache.services.bulk.BulkServiceCommands$PagedRequestResult@495dc920
[root@hedgehog14 ~]# echo "\s bulk request ls" | ssh -t -l admin -p "22224" "dcachetest.grid.surfsara.nl"
Pseudo-terminal will not be allocated because stdin is not a terminal.
org.dcache.services.bulk.BulkServiceCommands$PagedRequestResult@16c8db35
[root@hedgehog14 ~]# ssh -t -l admin -p "22224" "dcachetest.grid.surfsara.nl" "\s bulk request ls"
ID           | ARRIVED             |            MODIFIED |        OWNER |     STATUS | UID
22           | 2024/06/18-11:05:14 | 2024/06/18-11:05:14 |  36490:31631 |  COMPLETED | 19dd9f0b-b409-4ba5-8215-0d49d7de2133
23           | 2024/06/18-14:23:56 | 2024/06/18-14:23:56 |  36490:31631 |  COMPLETED | e2b47b27-b066-4f84-b0bd-d397722c722d
24           | 2024/06/20-11:44:03 | 2024/06/20-11:44:03 |  36490:31631 |  COMPLETED | 27c02ed6-65d4-423b-80f8-74a63719c9ae
25           | 2024/06/20-11:51:42 | 2024/06/20-11:51:42 |  36490:31631 |  COMPLETED | 4b52fbe0-d5c5-4152-b934-c3fa50490f02
26           | 2024/06/20-11:52:47 | 2024/06/20-11:52:47 |  36490:31631 |  COMPLETED | 188ed83f-90f8-488b-81f8-cc6b3245dd16
27           | 2024/06/20-11:52:57 | 2024/06/20-11:52:57 |  36490:31631 |  COMPLETED | 8f60d2c5-6f16-41df-839a-da03a84225d2
28           | 2024/06/20-11:53:02 | 2024/06/20-11:53:02 |  36490:31631 |  COMPLETED | a14bb1d1-d727-4264-84e2-c7717018c6ff
29           | 2024/06/20-11:54:37 | 2024/06/20-11:54:37 |  36490:31631 |  COMPLETED | 2a40bee5-ec79-4dbb-98aa-85e69c17a3b1
30           | 2024/06/20-11:54:44 | 2024/06/20-11:54:45 |  36490:31631 |  COMPLETED | f73f214f-0570-4435-b4ac-0949d1289e01
31           | 2024/06/20-13:30:45 | 2024/06/20-13:30:45 |  36490:31631 |  COMPLETED | 0c83f513-55f6-4f49-9198-377e14c09261
Connection to dcachetest.grid.surfsara.nl closed.

Feeding commands on stdin can be very useful though, especially when you have tons of commands: it's way faster than starting a separate ssh for each of them.