Closed heistera closed 2 months ago
This works fine for me - it looks like you have doing something odd to your job repository. Perhaps this hints as to what:
raise RepositoryError("Subjob key %s is already in the subjob XML cache!" % k)
I've double checked it: with ganga version 8.6.10 it works as expected ...
Ganga In [6]: jobs(31).subjobs(683).resplit(SplitByFiles( filesPerJob = 1 )) Ganga Out [6]: 1
[13:06:31] Ganga In [7]:
So we fixed the error in the exception raising - maybe you could try again with the DEV version?
Thanks for the fix and sorry for the late reply. I cannot test exactly the same subjob, because this subjoin is in a state which does not allow resplitting: Ganga In [1]: jobs(31).subjobs(670).resplit(SplitByFiles( filesPerJob = 1 )) ERROR JobError: You can only resplit subjobs in the failed, killed or completed status!
For another subjob of the same job the respiting is in progress for about 30min right now. This is possibly another issue of the ganga 8.7.X versions ... for large amount of jobs, subjoins all 8.7.X versions I've used the processing times for commands are incredibly slow ... since I will stop working now I keep the ganga session open and will report back what will have happened ...
Still by now the command to resplit is stuck: [20:19:44] Ganga In [4]: jobs(31).subjobs(56).resplit(SplitByFiles( filesPerJob = 1 ))
My provider will change IP address of my home router soon and will stop my lxplus session ...
What do you mean? How exactly have you managed to produce this? Have you had a look at what the debug say. We need more information otherwise we cannot help. It works fine for me, so there must be something particular about your setup.
I have a lot of jobs in my session and I am trying to be very careful when e.g. ending a ganga session. Using the newer ganga version for the amount of jobs I am processing takes me a lot of patience, because ganga is very, very slow. I cannot remove the ganga session output, because I need it to be able to find out the URLs of the data at the storage elements. So I've used ganga on a different cluster than lxplus to have a more reactive session ... perhaps it would be good to develop some sort of stress test using a lot of jobs for testing new ganga versions ...?
I use Ganga myself to submit many thousands of jobs for my own analyses and have had no issues. We work very hard to test it and make it faster - in fact the upgrade from 8.6.9 -> 8.7.0 brought significant speed increases.
Could you start Ganga in debug mode, try to resplit and send the output?
I have started ganga in debug more. Here are the results and sorry for the delay.
1.) ganga start takes long. A while ganga was stuck during the startup here is the printout I saw: 2024-07-18 20:24:39,388 MainThread RegistrySlice ::do_select :292 DEBUG : NOT Actually Selected 2024-07-18 20:24:39,388 MainThread Registry ::_locked_load :554 DEBUG : _locked_load 2024-07-18 20:24:39,388 MainThread GangaRepositoryXML ::load :1032 DEBUG : Loading Repo object(s): [30] 2024-07-18 20:24:39,561 MainThread LHCbCompressedDataset::init :150 DEBUG : Dataset Created2024-07-18 20:24:39,770 MainThread GangaRepositoryXML ::_load_xml_from_obj :941 DEBUG : Loading XML file for ID: 30 took 0.3006861209869385 sec 2024-07-18 20:24:39,770 MainThread GangaRepositoryXML ::_load_xml_from_obj :949 DEBUG : Checking children: 30
2) During the longish start I saw tons of these messages:
h/user/h/heistera/gangadir/repository/heistera/LocalXML/6.0/jobs/0xxx/34/782/data for job 34
2024-07-18 20:27:34,688 MainThread LHCbCompressedDataset::init :150 DEBUG : Dataset Created
2024-07-18 20:27:34,707 MainThread SubJobXMLList ::_getItem :382 DEBUG : Requesting subjob: #783
2024-07-18 20:27:34,708 MainThread SubJobXMLList ::_getItem :386 DEBUG : Attempting to load subjob: #783 from disk
2024-07-18 20:27:34,708 MainThread SubJobXMLList ::_loadSubJobFromDisk :352 DEBUG : Loading subjob at: /afs/cern.ch/user/h/heistera/gangadir/repository/heistera/LocalXML/6.0/jobs/0xxx/34/783/data for job 34
2024-07-18 20:27:34,718 MainThread LHCbCompressedDataset::init :150 DEBUG : Dataset Created
2024-07-18 20:27:34,724 MainThread SubJobXMLList ::_getItem :382 DEBUG : Requesting subjob: #784
2024-07-18 20:27:34,724 MainThread SubJobXMLList ::_getItem :386 DEBUG : Attempting to load subjob: #784 from disk
2024-07-18 20:27:34,724 MainThread SubJobXMLList ::_loadSubJobFromDisk :352 DEBUG : Loading subjob at: /afs/cern.ch/user/h/heistera/gangadir/repository/heistera/LocalXML/6.0/jobs/0xxx/34/784/data for job 34
2024-07-18 20:27:34,748 MainThread LHCbCompressedDataset::init :150 DEBUG : Dataset Created
2024-07-18 20:27:34,771 MainThread SubJobXMLList ::_getItem :382 DEBUG : Requesting subjob: #785
2024-07-18 20:27:34,771 MainThread SubJobXMLList ::_getItem :386 DEBUG : Attempting to load subjob: #785 from disk
2024-07-18 20:27:34,771 MainThread SubJobXMLList ::_loadSubJobFromDisk :352 DEBUG : Loading subjob at: /afs/cern.ch/user/h/heistera/gangadir/repository/heistera/LocalXML/6.0/jobs/0xxx/34/785/data for job 34
2024-07-18 20:27:34,805 MainThread LHCbCompressedDataset::init :150 DEBUG : Dataset Created
2024-07-18 20:27:34,829 MainThread SubJobXMLList ::_getItem :382 DEBUG : Requesting subjob: #786
2024-07-18 20:27:34,830 MainThread SubJobXMLList ::_getItem :386 DEBUG : Attempting to load subjob: #786 from disk
2024-07-18 20:27:34,830 MainThread SubJobXMLList ::_loadSubJobFromDisk :352 DEBUG : Loading subjob at: /afs/cern.ch/user/h/heistera/gangadir/repository/heistera/LocalXML/6.0/jobs/0xxx/34/786/data for job 34
2024-07-18 20:27:34,919 MainThread LHCbCompressedDataset::init :150 DEBUG : Dataset Created
2024-07-18 20:27:34,957 MainThread SubJobXMLList ::_getItem :382 DEBUG : Requesting subjob: #787
2024-07-18 20:27:34,957 MainThread SubJobXMLList ::_getItem :386 DEBUG : Attempting to load subjob: #787 from disk
2024-07-18 20:27:34,957 MainThread SubJobXMLList ::_loadSubJobFromDisk :352 DEBUG : Loading subjob at: /afs/cern.ch/user/h/heistera/gangadir/repository/heistera/LocalXML/6.0/jobs/0xxx/34/787/data for job 34
2024-07-18 20:27:34,992 MainThread LHCbCompressedDataset::init :150 DEBUG : Dataset Created
2024-07-18 20:27:35,040 MainThread SubJobXMLList ::_getItem :382 DEBUG : Requesting subjob: #788
2024-07-18 20:27:35,040 MainThread SubJobXMLList ::_getItem :386 DEBUG : Attempting to load subjob: #788 from disk
3) Resplitting of the job worked after some time but then the ganga session seems to have died ... all commands I issued ended with no printout ... I pasted the output of the session as I saw on my screen:
Ganga In [3]: jobs(30).subjobs(641).resplit(SplitByFiles( filesPerJob = 1 ))
Generating proxy...
Enter Certificate password: *****
Added VOMS attribute /lhcb/Role=user
Uploading proxy..
Proxy generated:
subject : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=heistera/CN=552506/CN=Arno Heister/CN=3575311884/CN=2728184313
issuer : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=heistera/CN=552506/CN=Arno Heister/CN=3575311884
identity : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=heistera/CN=552506/CN=Arno Heister
timeleft : 23:53:59
DIRAC group : lhcb_user
path : /tmp/x509up_u7443
username : heistera
properties : NormalUser, PrivateLimitedDelegation
VOMS : True
VOMS fqan : ['/lhcb/Role=user']
Proxies uploaded:
DN | Until (GMT)
/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=heistera/CN=552506/CN=Arno Heister | 2025/06/09 17:57
[20:35:42]
Ganga In [4]: jobs(3)
[20:36:10]
Ganga In [5]: jobs(30)
[20:36:17]
Ganga In [6]: jobs
[20:37:10]
Ganga In [7]: jobs
Ganga Out [7]: [20:39:46]
Ganga In [8]:
Dear developers! Using the latest released ganga version 8.7.8 a respiting of a job always fails. Earlier versions did not have this issue. Could you please have a look? Thx!