Open llayer opened 5 years ago
Lukas, sorry for delay in reply, I was traveling and sick. I can reproduce this problem, but since the task name and data range you used is quite old I bet it is issue with a schema of data which has been migrated. Can you try another, new-ish, task name and data range?
Hi @vkuznet, do you have a bit of bandwidth to help Lukas forward ?
Jean-Roch, I can help Lukas once he will explain exactly what needs to be done. So far I suggested to try out new-ish workflow since it seems to me that old one have issue with schema (we introduced PrepID at some point and it seems to me that task/date-range Lukas is using haven't had the PrepID in WMArchive). Valentin.
On 0, vlimant notifications@github.com wrote:
Hi @vkuznet, do you have a bit of bandwidth to help Lukas forward ?
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/dmwm/WMArchive/issues/351#issuecomment-465171810
makes sense, if prepid is explicitely requested, or searched about. which is it @llayer ? do you retreive the prepid explicitely, or is the query containing prepid ?
Jean-Roch, it is another way around. Originally stored docs haven't had PrepID and WMArchive schema neither. Later we introduced PrepID in schema and current schema requires it. That's why we should either search for new-ish task which has PrepID in it or we need to look-up back original schema and use it in queries. For simplicity I asked can we look-up new-ish task/workflow.
On 0, vlimant notifications@github.com wrote:
makes sense, if prepid is explicitely requested, or searched about. which is it @llayer ? do you retreive the prepid explicitely, or is the query containing prepid ?
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/dmwm/WMArchive/issues/351#issuecomment-465185121
Hi Valentin, Jean-Roch,
I finally managed to test and run a new task for a more recent production, e.g. I made a spec to find a lfn file that i recently used in my analysis:
"spec":{"lfn":"/store/mc/RunIISummer16NanoAODv4/TT_TuneCUETP8M2T4_13TeV-powheg-pythia8/NANOAODSIM/PUMoriond17_Nano14Dec2018_102X_mcRun2_asymptotic_v6-v1/90000/FCA7190F-5031-DA4F-8961-D0351E0D3FEB.root","timerange":[20190129,20190131]}, "fields":[]}
This runs and I get an output file as described in the wiki. Next I tried to follow the instructions to figure out the log files, as described here: https://github.com/dmwm/WMArchive/wiki/How-to-find-logArchive-logCollect-location-for-given-LFN and https://github.com/dmwm/WMArchive/wiki/Steps-for-Querying-LogCollect-file-Location
I tried to lookup the log file for one unmerged task that was returned, i.e. I made a spec:
{"spec":{"lfn":"/store/unmerged/RunIISummer16NanoAODv4/TT_TuneCUETP8M2T4_13TeV-powheg-pythia8/NANOAODSIM/PUMoriond17_Nano14Dec2018_102X_mcRun2_asymptotic_v6-v1/90000/121C44D2-DD1F-3F4D-BCE8-A518079C2E49.root","timerange":[20190129,20190131]}, "fields":[]}
and then run
myspark --script=LogFinder --spec=cond_tt_unmerged.spec --yarn
This returned:
### number of results 1
{"fields": [], "spec": {"query": ["/store/unmerged/logs/prod/2019/1/30/pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576/TOP-RunIISummer16NanoAODv4-00007_0/0000/0/b7b05f7d-f2d9-47ef-9465-06743ff2e9b1-305-0-logArchive.tar.gz"], "timerange": [20190129, 20190131]}, "queries": ["/store/unmerged/logs/prod/2019/1/30/pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576/TOP-RunIISummer16NanoAODv4-00007_0/0000/0/b7b05f7d-f2d9-47ef-9465-06743ff2e9b1-305-0-logArchive.tar.gz"]}
So finally this should be the location of the logfile and I should be able to retrieve it with xrtcp, however this last step now fails for me:
xrdcp root://cms-xrd-global.cern.ch//store/unmerged/logs/prod/2019/1/30/pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576/TOP-RunIISummer16NanoAODv4-00007_0/0000/0/b7b05f7d-f2d9-47ef-9465-06743ff2e9b1-305-0-logArchive.tar.gz
[0B/0B][100%][==================================================][0B/s]
Run: [ERROR] Server responded with an error: [3011] No servers are available to read the file.
Do you have any idea what might be the issue here? Many, many thanks in advance!
Lukas, I'm glad that WMArchive procedure to find records is working now, but I doubt I can help here with xrdcp problem. The storage of log files is beyond WMArchive responsibility. You may ask Alan (@amaltaro ) what is a policy to keep log archive files, but my understanding that they may be available on EOS.
Thank you Valentin, I will contact him!
Dear Alan @amaltaro,
for my service task I need to access and download a large number of log files. I finally managed to locate the files with the WMArchive, but now I have an issue to access the files with xrdcp:
xrdcp root://cms-xrd-global.cern.ch//store/unmerged/logs/prod/2019/1/30/pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576/TOP-RunIISummer16NanoAODv4-00007_0/0000/0/b7b05f7d-f2d9-47ef-9465-06743ff2e9b1-305-0-logArchive.tar.gz
[0B/0B][100%][==================================================][0B/s]
Run: [ERROR] Server responded with an error: [3011] No servers are available to read the file.
Is there an easy way to download the logs once I have the location?
Many thanks!!
Hi Lucas, yes, there is an easy to download them within ~2 months of lifetime, it's through EOS over HTTP, problem is that we didn't have enough quota until beginning of Feb and all attempts to upload logs there failed since beginning of Jan.
Second option would be to fetch these from the site the workflow is running at. However, those are transient files and they are collected by a LogCollect job, which makes a bigger tarball of logArchives and transfer it to CERN (both CASTOR and EOS).
Does this script find which logCollect tarball contains your unmerged logArchive?
Hi Alan,
thanks for your prompt reply. If I understand correctly there is a procedure described in https://github.com/dmwm/WMArchive/wiki/Steps-for-Querying-LogCollect-file-Location to obtain the logArchive.tar.gz for the unmerged jobs of a merged job. So I assumed that a query like
{"fields": [], "spec": {"query": ["/store/unmerged/logs/prod/2019/1/30/pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576/TOP-RunIISummer16NanoAODv4-00007_0/0000/0/b7b05f7d-f2d9-47ef-9465-06743ff2e9b1-305-0-logArchive.tar.gz"], "timerange": [20190129, 20190131]}, "queries": ["/store/unmerged/logs/prod/2019/1/30/pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576/TOP-RunIISummer16NanoAODv4-00007_0/0000/0/b7b05f7d-f2d9-47ef-9465-06743ff2e9b1-305-0-logArchive.tar.gz"]}
gives me the tarball of the unmerged log Archive, but I don't see how I can access this on EOS?
Hi Alan @amaltaro,
Do you have any quick solution to download the logArchive.tar.gz as described in the post above? I tried a few things, but I was not able to find a working solution. Having at least some log files downloaded would help me a lot to continue my work.
Thank you so much!
Lucas, you can find these LogCollect tarballs available in both EOS and CASTOR CERN storage:
amaltaro@lxplus029:~/tmp $ eoscms ls /store/logs/prod/2019/01/WMAgent/pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576/
pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576-LogCollectForTOP-RunIISummer16NanoAODv4-00007_0-cmsgli-4624449-0-cmswn2424-2-logs.tar
pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576-LogCollectForTOP-RunIISummer16NanoAODv4-00007_0-cmsgli-4658037-0-cmswn2330-2-logs.tar
pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576-LogCollectForTOP-RunIISummer16NanoAODv4-00007_0-cmsgli-5015108-0-cmswn2406-2-logs.tar
pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576-LogCollectForTOP-RunIISummer16NanoAODv4-00007_0-node097-1-logs.tar
pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576-LogCollectForTOP-RunIISummer16NanoAODv4-00007_0-node101-1-logs.tar
pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576-LogCollectForTOP-RunIISummer16NanoAODv4-00007_0-node101-3-logs.tar
pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576-LogCollectForTOP-RunIISummer16NanoAODv4-00007_0-node101-4-logs.tar
pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576-LogCollectForTOP-RunIISummer16NanoAODv4-00007_0-node106-4-logs.tar
pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576-LogCollectForTOP-RunIISummer16NanoAODv4-00007_0-node109-1-logs.tar
pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576-LogCollectForTOP-RunIISummer16NanoAODv4-00007_0-node122-3-logs.tar
HTH. Alan.
Thank you Alan,
I finally managed to download the first files!
Dear @amaltaro,
for my project with @vlimant in the last weeks I collected the logcollect paths for all failing logs that I need. However while downloading them from cmseos I got the impression that logs older than ~ 02/2018 are deleted, e.g. as a concrete example I cannot find
eos ls /store/logs/prod/2017/11/WMAgent/pdmvserv_task_EGM-RunIIFall17GS-00010__v1_T_171013_161526_3751/pdmvserv_task_EGM-RunIIFall17GS-00010__v1_T_171013_161526_3751-LogCollectForEGM-RunIIFall17GS-00010_0-vmp365-1326-logs.tar
Unable to stat /eos/cms/store/logs/prod/2017/11/WMAgent/pdmvserv_task_EGM-RunIIFall17GS-00010__v1_T_171013_161526_3751/pdmvserv_task_EGM-RunIIFall17GS-00010__v1_T_171013_161526_3751-LogCollectForEGM-RunIIFall17GS-00010_0-vmp365-1326-logs.tar; No such file or directory (errc=2) (No such file or directory)
and it seems that at least most folders in /store/logs/prod/2017/11/WMAgent/ are empty. Can you confirm that the logs are deleted at some point and do you know whether they are archived somewhere else?
Many, many thanks in advance!
Hi @llayer , EOS is supposed to keep logs for only a couple of months, so consider yourself lucky for finding more than a year logs in there.
You need to access the archival storage in castor, you can use the same path as the one provided for EOS, but use /castor/cern.ch/cms/store/logs/prod/
instead (with xrdcp for instance).
Hi @amaltaro,
many thanks, I am now able to locate the files on castor, but copying with xrdcp returns me an error:
xrdcp root://castorcms.cern.ch//castor/cern.ch/cms/store/logs/prod/2017/11/WMAgent/pdmvserv_task_EGM-RunIIFall17GS-00010__v1_T_171013_161526_3751/pdmvserv_task_EGM-RunIIFall17GS-00010__v1_T_171013_161526_3751-LogCollectForEGM-RunIIFall17GS-00010_0-vmp365-1326-logs.tar . -OSsvcClass=OSsvcClass
[0B/0B][100%][==================================================][0B/s]
Run: [ERROR] Server responded with an error: [3005] Unable to do async GET request. Insufficient privileges for user 76093,1399 performing a StageGetRequest request on svcClass 'OSsvcClass'; Communication error on send
Do I need some special permissions? I also cannot make a stager_qry:
stager_qry -M /castor/cern.ch/cms/store/logs/prod/2017/11/WMAgent/pdmvserv_task_EGM-RunIIFall17GS-00010__v1_T_171013_161526_3751/pdmvserv_task_EGM-RunIIFall17GS-00010__v1_T_171013_161526_3751-LogCollectForEGM-RunIIFall17GS-00010_0-vmp365-1326-logs.tar .
Error: Permission denied
stage_filequery: Insufficient privileges for user 76093,1399 performing a StageFileQueryRequest request on svcClass ''
I am not familiar with castor, so if you could give me another hint it would be fantastic! Best wishes and many thanks again,
Cheers, Lukas
Have you tried without the service class option? I haven't used castor for a long time now, so you might need to ask CERN IT via a snow ticket.
Hi,
I have a problem to run myspark from lxplus. I work with Jean-Roch and I need to access some error logs. I tried to reproduce the example in https://github.com/dmwm/WMArchive/wiki/How-to-find-records-on-HDFS-using-pyspark
I logged in from an lxplus node following https://hadoop-user-guide.web.cern.ch/hadoop-user-guide/getstart/client_cvmfs.html, since it is not possible anymore to log in via ssh analytix
However when I run the the example in the twiki: {"spec":{"task":"/amaltaro_StepChain_ReDigi3_HG1612_WMArchive_161130_192654_9283/DIGI","timerange":[20161130,20161202]}, "fields":[]}
myspark --spec=cond.spec --script=RecordFinder --records-output=records.json
I get an error message that tells me that the PrepID is missing: I attach the output of myspark below.
It would be great if you could help me to solve this problem.
Many, many thanks in advance, Best Lukas