dmwm / PHEDEX

CMS data-placement suite
8 stars 18 forks source link

Little bug in PhEDEx 3.0.0 concerning -link-active-file #369

Closed ericvaandering closed 11 years ago

ericvaandering commented 11 years ago

Original Savannah ticket 36057 reported by egeland on Mon Apr 28 03:54:30 2008.

There is a small bug concerning the -link-active-files option.

I've configured an agent so:

AGENT LABEL=download-t1-test PROGRAM=Toolkit/Transfer

/FileDownload -db ${PHEDEX_DBPARAM} -nodes ${PHEDEX_NODE}_Buffer -ignore T1_CERN_MSS -accept 'T1%' -delete ${PHEDEX_CONF}/FileDownloadDelete.srm -validate ${PHEDEX_CONF}/FileDownloadVerify.srm.rmLT07files -verbose -backend FTS -service ${PHEDEX_FTS_SERVER} -protocols 'srmv2','srm','direct' -batch-files 10 -max-active-files 100 -link-active-files T1_FZK_Buffer=1 -link-pending-files 10

So, I want to decrease the load on FZK -> PIC to only 1 file. So, if I look in the channel, I see that in fact there is always 1 job running but with 10 files within:

[root@fts02 root]# glite-transfer-list -c FZKLCG2-PIC | awk '{ print $1 }' | xargs glite-transfer-status -l | egrep "Destination|State" | awk -F":844" '{ print $ 1}' | sort | uniq -c 10 Destination: srm://srm-disk.pic.es 4 State: Active 6 State: Done

So, I think this option needs to overwrite the -batch-files if it is higher, as we work thinking in files (as FTS does). I think this is a detail, but important to be solved if not people will try to reduce load and hit this problem.

Thanks! Pepe.

ericvaandering commented 11 years ago

Comment by egeland on Fri May 23 08:55:22 2008

Allegedly fixed in PHEDEX::Transfer::FTS revision 1.52. Pepe, could you test this?

ericvaandering commented 11 years ago

Comment by jflix on Fri May 23 10:33:17 2008

Hi Ricky,

sure, I will test it beggining of next week. I need to relax... this week has been horribly busy! ;(

ericvaandering commented 11 years ago

Comment by egeland on Fri May 30 11:45:45 2008

Hi Pepe,

Any luck with the testing?

ericvaandering commented 11 years ago

Comment by jflix on Mon Jun 2 04:49:44 2008

Hi Ricky,

I am testing atm.

Pepe.

ericvaandering commented 11 years ago

Comment by jflix on Mon Jun 2 05:16:00 2008

Hi Ricky,

I think is not working properly, as far as I can see... My config is:

AGENT LABEL=download-t2-sites PROGRAM=Toolkit/Transfer/FileDownload

-db ${PHEDEX_DBPARAM} -nodes ${PHEDEX_NODE}_Buffer -accept 'T2%' -delete ${PHEDEX_CONF}/FileDownloadDelete.srm -validate ${PHEDEX_CONF}/FileDownloadVerify.srm.rmLT07files,-d -verbose -backend FTS -service ${PHEDEX_FTS_SERVER} -protocols 'srmv2','srm','direct' -batch-files 5 -max-active-files 75 -link-active-files T2_ES_CIEMAT=1 -link-pending-files 10

This should keep channels busy with jobs with 5 files each, except for CIEMAT, that only 1 should be kept. I see at FTS level jobs with 3 files each, so I cannot understand anything. Shall I clean the tasks on the inbox/work/tasks for this agent to ensure the new config is taken?

[root@fts02 root]# glite-transfer-status -l 95b3bc2d-3089-11dd-be72-e5cf3eef7922 Finished Source: srm://srm.ciemat.es:8443/srm/managerv2?SFN=/pnfs/ciemat.es/data/cms/prod/store/PhEDEx_LoadTest07/MonarcTest_CIEMAT_25 Destination: srm://srm-disk.pic.es:8443/srm/managerv2?SFN=/pnfs/pic.es/data/cms/store/PhEDEx_LoadTest07/LoadTest07_Debug_Spain_CIEMAT/PIC/201/LoadTest07_Spain_CIEMAT_25_1Z5mTrjoSyeDKWMp_201 State: Finished Retries: 0 Reason: (null) Duration: 657

Source: srm://srm.ciemat.es:8443/srm/managerv2?SFN=/pnfs/ciemat.es/data/cms/prod/store/PhEDEx_LoadTest07/MonarcTest_CIEMAT_A0 Destination: srm://srm-disk.pic.es:8443/srm/managerv2?SFN=/pnfs/pic.es/data/cms/store/PhEDEx_LoadTest07/LoadTest07_Debug_Spain_CIEMAT/PIC/201/LoadTest07_Spain_CIEMAT_A0_txARkF9cod9F6IsT_201 State: Finished Retries: 0 Reason: (null) Duration: 1021

Source: srm://srm.ciemat.es:8443/srm/managerv2?SFN=/pnfs/ciemat.es/data/cms/prod/store/PhEDEx_LoadTest07/MonarcTest_CIEMAT_72 Destination: srm://srm-disk.pic.es:8443/srm/managerv2?SFN=/pnfs/pic.es/data/cms/store/PhEDEx_LoadTest07/LoadTest07_Debug_Spain_CIEMAT/PIC/201/LoadTest07_Spain_CIEMAT_72_EPPK56u6SjfEwzyt_201 State: Finished Retries: 0 Reason: (null) Duration: 705 [root@fts02 root]# glite-transfer-list -c LIPCOIMBRA-PIC 5006a3d2-308c-11dd-be72-e5cf3eef7922 Active [root@fts02 root]# glite-transfer-status -l 5006a3d2-308c-11dd-be72-e5cf3eef7922 Active Source: srm://grid004.lca.uc.pt:8446/srm/managerv2?SFN=/dpm/lca.uc.pt/home/cms/store/PhEDEx_LoadTest07/LoadTest07_Debug_LIP_Coimbra/LoadTest07_LIP_Coimbra_84 Destination: srm://srm-disk.pic.es:8443/srm/managerv2?SFN=/pnfs/pic.es/data/cms/store/PhEDEx_LoadTest07/LoadTest07_Debug_LIP_Coimbra/PIC/299/LoadTest07_LIP_Coimbra_84_YdDCd8cYIOudO37e_299 State: Active Retries: 0 Reason: (null) Duration: 0

Source: srm://grid004.lca.uc.pt:8446/srm/managerv2?SFN=/dpm/lca.uc.pt/home/cms/store/PhEDEx_LoadTest07/LoadTest07_Debug_LIP_Coimbra/LoadTest07_LIP_Coimbra_33 Destination: srm://srm-disk.pic.es:8443/srm/managerv2?SFN=/pnfs/pic.es/data/cms/store/PhEDEx_LoadTest07/LoadTest07_Debug_LIP_Coimbra/PIC/299/LoadTest07_LIP_Coimbra_33_4AQjKszziszITgvQ_299 State: Active Retries: 0 Reason: (null) Duration: 0

Source: srm://grid004.lca.uc.pt:8446/srm/managerv2?SFN=/dpm/lca.uc.pt/home/cms/store/PhEDEx_LoadTest07/LoadTest07_Debug_LIP_Coimbra/LoadTest07_LIP_Coimbra_BD Destination: srm://srm-disk.pic.es:8443/srm/managerv2?SFN=/pnfs/pic.es/data/cms/store/PhEDEx_LoadTest07/LoadTest07_Debug_LIP_Coimbra/PIC/299/LoadTest07_LIP_Coimbra_BD_HO3xru5lZML6IWTA_299 State: Active Retries: 0 Reason: (null) Duration: 0

ericvaandering commented 11 years ago

Comment by jflix on Mon Jun 2 06:00:09 2008

I cleaned the /tasks + /work directories in this Debug agent. I still see 3 files / job for CIEMAT, when it should be only 1 file, according to new configuration:

[root@fts02 root]# glite-transfer-status -l 36b3e2ec-3092-11dd-be72-e5cf3eef7922 Active Source: srm://srm.ciemat.es:8443/srm/managerv2?SFN=/pnfs/ciemat.es/data/cms/prod/store/PhEDEx_LoadTest07/MonarcTest_CIEMAT_3D Destination: srm://srm-disk.pic.es:8443/srm/managerv2?SFN=/pnfs/pic.es/data/cms/store/PhEDEx_LoadTest07/LoadTest07_Debug_Spain_CIEMAT/PIC/201/LoadTest07_Spain_CIEMAT_3D_tYClrqNPiVr8I105_201 State: Done Retries: 0 Reason: (null) Duration: 119

Source: srm://srm.ciemat.es:8443/srm/managerv2?SFN=/pnfs/ciemat.es/data/cms/prod/store/PhEDEx_LoadTest07/MonarcTest_CIEMAT_78 Destination: srm://srm-disk.pic.es:8443/srm/managerv2?SFN=/pnfs/pic.es/data/cms/store/PhEDEx_LoadTest07/LoadTest07_Debug_Spain_CIEMAT/PIC/201/LoadTest07_Spain_CIEMAT_78_hMYuslcYgFqhmW2s_201 State: Done Retries: 0 Reason: (null) Duration: 114

Source: srm://srm.ciemat.es:8443/srm/managerv2?SFN=/pnfs/ciemat.es/data/cms/prod/store/PhEDEx_LoadTest07/MonarcTest_CIEMAT_08 Destination: srm://srm-disk.pic.es:8443/srm/managerv2?SFN=/pnfs/pic.es/data/cms/store/PhEDEx_LoadTest07/LoadTest07_Debug_Spain_CIEMAT/PIC/201/LoadTest07_Spain_CIEMAT_08_0Uwu6F01Yn8AfXJB_201 State: Active Retries: 0 Reason: (null) Duration: 0

For Coimbra I see the same, 3 files/job:

[root@fts02 root]# glite-transfer-status -l 360f176b-3092-11dd-be72-e5cf3eef7922 Active Source: srm://grid004.lca.uc.pt:8446/srm/managerv2?SFN=/dpm/lca.uc.pt/home/cms/store/PhEDEx_LoadTest07/LoadTest07_Debug_LIP_Coimbra/LoadTest07_LIP_Coimbra_F6 Destination: srm://srm-disk.pic.es:8443/srm/managerv2?SFN=/pnfs/pic.es/data/cms/store/PhEDEx_LoadTest07/LoadTest07_Debug_LIP_Coimbra/PIC/300/LoadTest07_LIP_Coimbra_F6_SYy0zyktFx8ZzhrP_300 State: Done Retries: 0 Reason: (null) Duration: 307

Source: srm://grid004.lca.uc.pt:8446/srm/managerv2?SFN=/dpm/lca.uc.pt/home/cms/store/PhEDEx_LoadTest07/LoadTest07_Debug_LIP_Coimbra/LoadTest07_LIP_Coimbra_6B Destination: srm://srm-disk.pic.es:8443/srm/managerv2?SFN=/pnfs/pic.es/data/cms/store/PhEDEx_LoadTest07/LoadTest07_Debug_LIP_Coimbra/PIC/300/LoadTest07_LIP_Coimbra_6B_vmvzfFlxmZawYki4_300 State: Active Retries: 0 Reason: (null) Duration: 0

Source: srm://grid004.lca.uc.pt:8446/srm/managerv2?SFN=/dpm/lca.uc.pt/home/cms/store/PhEDEx_LoadTest07/LoadTest07_Debug_LIP_Coimbra/LoadTest07_LIP_Coimbra_17 Destination: srm://srm-disk.pic.es:8443/srm/managerv2?SFN=/pnfs/pic.es/data/cms/store/PhEDEx_LoadTest07/LoadTest07_Debug_LIP_Coimbra/PIC/300/LoadTest07_LIP_Coimbra_17_hMExjiBLbvo50QGR_300 State: Active Retries: 0 Reason: (null) Duration: 0

It seems is not working as expected.

ericvaandering commented 11 years ago

Comment by egeland on Mon Dec 15 05:16:51 2008

I think this was fixed in PHEDEX_3_0_4. Pepe, can you confirm that -link-active-files takes precedence over -link-pending-files ?

Marking "Remind".

ericvaandering commented 11 years ago

Closed by egeland on Tue May 12 04:29:56 2009

ericvaandering commented 11 years ago

Comment by egeland on Tue May 12 04:29:56 2009

Confirmed by Pepe to work due to lack of complaints in 1 year...