dmwm / CRABServer

15 stars 38 forks source link

TaskPublish fails in v3.210428 #6576

Closed belforte closed 3 years ago

belforte commented 3 years ago
2021-04-29 15:49:32,966:ERROR:PublisherMaster,611:Exception when calling TaskPublish!
Failed to execute command: python /data/srv/TaskManager/v3.210422/slc7_amd64_gcc630/cms/crabtaskworker/v3.210422/lib/python2.7/site-packages/Publisher/TaskPublish.pyc  --configFile=/data/srv/Publisher/PublisherConfig.py --taskname=210318_221646:belforte_crab_20210318_231640.
 StdErr: Traceback (most recent call last):
  File "/build/cmsbld/jenkins/workspace/CRABServer_BuildOnRelease/w/BUILD/slc7_amd64_gcc630/cms/crabtaskworker/v3.210422/CRABServer-v3.210422/build/lib/Publisher/TaskPublish.py", line 951, in <module>
    main()
  File "/build/cmsbld/jenkins/workspace/CRABServer_BuildOnRelease/w/BUILD/slc7_amd64_gcc630/cms/crabtaskworker/v3.210422/CRABServer-v3.210422/build/lib/Publisher/TaskPublish.py", line 947, in main
    result = publishInDBS3(config, taskname, verbose)
  File "/build/cmsbld/jenkins/workspace/CRABServer_BuildOnRelease/w/BUILD/slc7_amd64_gcc630/cms/crabtaskworker/v3.210422/CRABServer-v3.210422/build/lib/Publisher/TaskPublish.py", line 765, in publishInDBS3
    dbsFiles.append(format_file_3(file_))
  File "/build/cmsbld/jenkins/workspace/CRABServer_BuildOnRelease/w/BUILD/slc7_amd64_gcc630/cms/crabtaskworker/v3.210422/CRABServer-v3.210422/build/lib/Publisher/TaskPublish.py", line 41, in format_file_3
    file_lumi_list.append({'lumi_section_num': int(lumi), 'run_num': int(run)})
ValueError: invalid literal for int() with base 10: "b'1924"
.
Traceback (most recent call last):
  File "/data/srv/TaskManager/v3.210422/slc7_amd64_gcc630/cms/crabtaskworker/v3.210422/lib/python2.7/site-packages/Publisher//PublisherMaster.py", line 583, in startSlave
    raise Exception(errorMsg)
Exception: Failed to execute command: python /data/srv/TaskManager/v3.210422/slc7_amd64_gcc630/cms/crabtaskworker/v3.210422/lib/python2.7/site-packages/Publisher/TaskPublish.pyc  --configFile=/data/srv/Publisher/PublisherConfig.py --taskname=210318_221646:belforte_crab_20210318_231640.
 StdErr: Traceback (most recent call last):
  File "/build/cmsbld/jenkins/workspace/CRABServer_BuildOnRelease/w/BUILD/slc7_amd64_gcc630/cms/crabtaskworker/v3.210422/CRABServer-v3.210422/build/lib/Publisher/TaskPublish.py", line 951, in <module>
    main()
  File "/build/cmsbld/jenkins/workspace/CRABServer_BuildOnRelease/w/BUILD/slc7_amd64_gcc630/cms/crabtaskworker/v3.210422/CRABServer-v3.210422/build/lib/Publisher/TaskPublish.py", line 947, in main
    result = publishInDBS3(config, taskname, verbose)
  File "/build/cmsbld/jenkins/workspace/CRABServer_BuildOnRelease/w/BUILD/slc7_amd64_gcc630/cms/crabtaskworker/v3.210422/CRABServer-v3.210422/build/lib/Publisher/TaskPublish.py", line 765, in publishInDBS3
    dbsFiles.append(format_file_3(file_))
  File "/build/cmsbld/jenkins/workspace/CRABServer_BuildOnRelease/w/BUILD/slc7_amd64_gcc630/cms/crabtaskworker/v3.210422/CRABServer-v3.210422/build/lib/Publisher/TaskPublish.py", line 41, in format_file_3
    file_lumi_list.append({'lumi_section_num': int(lumi), 'run_num': int(run)})
ValueError: invalid literal for int() with base 10: "b'1924"
.
belforte commented 3 years ago

But code for TaskPublish has not changed :-(

belforte commented 3 years ago

looking better, that is an old task, not the one I submitted now after changing Publisher !

belforte commented 3 years ago

hmm... Publisher is not running !! I have a more mundame problem which prevented it from starting:

[crab3@crab-preprod-tw01 Publisher]$ cat nohup.out 
Traceback (most recent call last):
  File "/data/repos/CRABServer/src/python/Publisher//PublisherMaster.py", line 624, in <module>
    master = Master(confFile=configurationFile)
  File "/data/repos/CRABServer/src/python/Publisher//PublisherMaster.py", line 184, in __init__
    self.crabServer = CRABRest(hostname=restHost, localcert=config.serviceCert,
AttributeError: 'Configuration' object has no attribute 'serviceCert'
[crab3@crab-preprod-tw01 Publisher]$ 
belforte commented 3 years ago

My initial mistake was due to the fact that preprod DB contains some old task submitted when working on porting REST to new WMCore with incorrect metadata, so every time publisher starts, it keeps trying those and failing. This is quite ugly, the best solution seems to me to clean the preprod DB instance and start fresh with proper metadata. But in general it may always happen that sometimes we upload bad metadata, should not keep trying those for months until relevant partition is dropped.

belforte commented 3 years ago

I guess I could write a small script to exercise old unused FMD API's

https://github.com/dmwm/CRABServer/blob/e2cd5387831ea50c4d267272d4f1972372ced504/src/python/CRABInterface/DataFileMetadata.py#L116-L133