dmwm / WMArchive

Workload Archive project
4 stars 13 forks source link

Missing required field PrepID #351

Open llayer opened 5 years ago

llayer commented 5 years ago

Hi,

I have a problem to run myspark from lxplus. I work with Jean-Roch and I need to access some error logs. I tried to reproduce the example in https://github.com/dmwm/WMArchive/wiki/How-to-find-records-on-HDFS-using-pyspark

I logged in from an lxplus node following https://hadoop-user-guide.web.cern.ch/hadoop-user-guide/getstart/client_cvmfs.html, since it is not possible anymore to log in via ssh analytix

However when I run the the example in the twiki: {"spec":{"task":"/amaltaro_StepChain_ReDigi3_HG1612_WMArchive_161130_192654_9283/DIGI","timerange":[20161130,20161202]}, "fields":[]}

myspark --spec=cond.spec --script=RecordFinder --records-output=records.json

I get an error message that tells me that the PrepID is missing: I attach the output of myspark below.

It would be great if you could help me to solve this problem.

Many, many thanks in advance, Best Lukas

[llayer@lxplus130 WMArchive]$ myspark --spec=cond.spec --script=RecordFinder --records-output=records.json
Using spark 2.X
Enable CVMFS ...
Ivy Default Cache set to: /afs/cern.ch/user/l/llayer/.ivy2/cache
The jars for the packages stored in: /afs/cern.ch/user/l/llayer/.ivy2/jars
:: loading settings :: url = jar:file:/cvmfs/sft.cern.ch/lcg/releases/spark/2.2.1-3206b/x86_64-centos7-gcc62-opt/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.databricks#spark-avro_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
        confs: [default]
        found com.databricks#spark-avro_2.11;4.0.0 in central
        found org.slf4j#slf4j-api;1.7.5 in central
        found org.apache.avro#avro;1.7.6 in central
        found org.codehaus.jackson#jackson-core-asl;1.9.13 in central
        found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in central
        found com.thoughtworks.paranamer#paranamer;2.3 in central
        found org.xerial.snappy#snappy-java;1.0.5 in central
        found org.apache.commons#commons-compress;1.4.1 in central
        found org.tukaani#xz;1.0 in central
:: resolution report :: resolve 963ms :: artifacts dl 38ms
        :: modules in use:
        com.databricks#spark-avro_2.11;4.0.0 from central in [default]
        com.thoughtworks.paranamer#paranamer;2.3 from central in [default]
        org.apache.avro#avro;1.7.6 from central in [default]
        org.apache.commons#commons-compress;1.4.1 from central in [default]
        org.codehaus.jackson#jackson-core-asl;1.9.13 from central in [default]
        org.codehaus.jackson#jackson-mapper-asl;1.9.13 from central in [default]
        org.slf4j#slf4j-api;1.7.5 from central in [default]
        org.tukaani#xz;1.0 from central in [default]
        org.xerial.snappy#snappy-java;1.0.5 from central in [default]
        :: evicted modules:
        org.slf4j#slf4j-api;1.6.4 by [org.slf4j#slf4j-api;1.7.5] in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   10  |   0   |   0   |   1   ||   9   |   0   |
        ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
        confs: [default]
        0 artifacts copied, 9 already retrieved (0kB/25ms)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/cvmfs/sft.cern.ch/lcg/releases/spark/2.2.1-3206b/x86_64-centos7-gcc62-opt/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/cvmfs/sft.cern.ch/lcg/releases/hadoop/2.7.3-b1221/x86_64-centos7-gcc62-opt/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/02/08 09:11:08 INFO SparkContext: Running Spark version 2.2.1
19/02/08 09:11:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/02/08 09:11:09 INFO SparkContext: Submitted application: AvroKeyInputFormat
19/02/08 09:11:09 INFO SecurityManager: Changing view acls to: llayer
19/02/08 09:11:09 INFO SecurityManager: Changing modify acls to: llayer
19/02/08 09:11:09 INFO SecurityManager: Changing view acls groups to: 
19/02/08 09:11:09 INFO SecurityManager: Changing modify acls groups to: 
19/02/08 09:11:09 INFO SecurityManager: SecurityManager: authentication enabled; ui acls disabled; users  with view permissions: Set(llayer); groups with view permissions: Set(); users  with modify permissions: Set(llayer); groups with modify permissions: Set()
19/02/08 09:11:09 INFO Utils: Successfully started service 'sparkDriver' on port 5001.
19/02/08 09:11:09 INFO SparkEnv: Registering MapOutputTracker
19/02/08 09:11:09 INFO SparkEnv: Registering BlockManagerMaster
19/02/08 09:11:09 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/02/08 09:11:09 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/02/08 09:11:09 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-71770f69-9df2-48f0-96b8-541540e19d03
19/02/08 09:11:09 INFO MemoryStore: MemoryStore started with capacity 2004.6 MB
19/02/08 09:11:09 INFO SparkEnv: Registering OutputCommitCoordinator
19/02/08 09:11:09 INFO log: Logging initialized @11312ms
19/02/08 09:11:09 INFO Server: jetty-9.3.z-SNAPSHOT
19/02/08 09:11:09 INFO Server: Started @11410ms
19/02/08 09:11:09 INFO AbstractConnector: Started ServerConnector@220b5902{HTTP/1.1,[http/1.1]}{0.0.0.0:5201}
19/02/08 09:11:09 INFO Utils: Successfully started service 'SparkUI' on port 5201.
19/02/08 09:11:09 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1ea5f333{/jobs,null,AVAILABLE,@Spark}
19/02/08 09:11:09 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@4c756e2b{/jobs/json,null,AVAILABLE,@Spark}
19/02/08 09:11:09 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@9cf0917{/jobs/job,null,AVAILABLE,@Spark}
19/02/08 09:11:09 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@5c0adfa3{/jobs/job/json,null,AVAILABLE,@Spark}
19/02/08 09:11:09 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@154de6bd{/stages,null,AVAILABLE,@Spark}
19/02/08 09:11:09 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@59dff77e{/stages/json,null,AVAILABLE,@Spark}
19/02/08 09:11:09 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@69ffa2df{/stages/stage,null,AVAILABLE,@Spark}
19/02/08 09:11:09 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@63fc12aa{/stages/stage/json,null,AVAILABLE,@Spark}
19/02/08 09:11:09 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@56bcafc0{/stages/pool,null,AVAILABLE,@Spark}
19/02/08 09:11:09 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@2fd3a3e0{/stages/pool/json,null,AVAILABLE,@Spark}
19/02/08 09:11:09 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@6a0d5620{/storage,null,AVAILABLE,@Spark}
19/02/08 09:11:09 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@38c6051f{/storage/json,null,AVAILABLE,@Spark}
19/02/08 09:11:09 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@3db721{/storage/rdd,null,AVAILABLE,@Spark}
19/02/08 09:11:09 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@41726ede{/storage/rdd/json,null,AVAILABLE,@Spark}
19/02/08 09:11:09 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@26fada3e{/environment,null,AVAILABLE,@Spark}
19/02/08 09:11:09 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1ef26daf{/environment/json,null,AVAILABLE,@Spark}
19/02/08 09:11:09 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@3aa8db11{/executors,null,AVAILABLE,@Spark}
19/02/08 09:11:09 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@5c06ad81{/executors/json,null,AVAILABLE,@Spark}
19/02/08 09:11:09 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@74b7f684{/executors/threadDump,null,AVAILABLE,@Spark}
19/02/08 09:11:09 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@6db8becd{/executors/threadDump/json,null,AVAILABLE,@Spark}
19/02/08 09:11:09 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@703bb05b{/static,null,AVAILABLE,@Spark}
19/02/08 09:11:09 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@346b2582{/,null,AVAILABLE,@Spark}
19/02/08 09:11:09 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@48511a93{/api,null,AVAILABLE,@Spark}
19/02/08 09:11:09 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@4f021b90{/jobs/job/kill,null,AVAILABLE,@Spark}
19/02/08 09:11:09 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@27a69ce6{/stages/stage/kill,null,AVAILABLE,@Spark}
19/02/08 09:11:09 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://188.184.30.74:5201
19/02/08 09:11:09 INFO SparkContext: Added JAR file:/afs/cern.ch/user/v/valya/public/spark/spark-examples-1.6.0-cdh5.15.1-hadoop2.6.0-cdh5.15.1.jar at spark://188.184.30.74:5001/jars/spark-examples-1.6.0-cdh5.15.1-hadoop2.6.0-cdh5.15.1.jar with timestamp 1549613469967
19/02/08 09:11:09 INFO SparkContext: Added JAR file:/afs/cern.ch/user/l/llayer/.ivy2/jars/com.databricks_spark-avro_2.11-4.0.0.jar at spark://188.184.30.74:5001/jars/com.databricks_spark-avro_2.11-4.0.0.jar with timestamp 1549613469968
19/02/08 09:11:09 INFO SparkContext: Added JAR file:/afs/cern.ch/user/l/llayer/.ivy2/jars/org.slf4j_slf4j-api-1.7.5.jar at spark://188.184.30.74:5001/jars/org.slf4j_slf4j-api-1.7.5.jar with timestamp 1549613469968
19/02/08 09:11:09 INFO SparkContext: Added JAR file:/afs/cern.ch/user/l/llayer/.ivy2/jars/org.apache.avro_avro-1.7.6.jar at spark://188.184.30.74:5001/jars/org.apache.avro_avro-1.7.6.jar with timestamp 1549613469969
19/02/08 09:11:09 INFO SparkContext: Added JAR file:/afs/cern.ch/user/l/llayer/.ivy2/jars/org.codehaus.jackson_jackson-core-asl-1.9.13.jar at spark://188.184.30.74:5001/jars/org.codehaus.jackson_jackson-core-asl-1.9.13.jar with timestamp 1549613469969
19/02/08 09:11:09 INFO SparkContext: Added JAR file:/afs/cern.ch/user/l/llayer/.ivy2/jars/org.codehaus.jackson_jackson-mapper-asl-1.9.13.jar at spark://188.184.30.74:5001/jars/org.codehaus.jackson_jackson-mapper-asl-1.9.13.jar with timestamp 1549613469969
19/02/08 09:11:09 INFO SparkContext: Added JAR file:/afs/cern.ch/user/l/llayer/.ivy2/jars/com.thoughtworks.paranamer_paranamer-2.3.jar at spark://188.184.30.74:5001/jars/com.thoughtworks.paranamer_paranamer-2.3.jar with timestamp 1549613469969
19/02/08 09:11:09 INFO SparkContext: Added JAR file:/afs/cern.ch/user/l/llayer/.ivy2/jars/org.xerial.snappy_snappy-java-1.0.5.jar at spark://188.184.30.74:5001/jars/org.xerial.snappy_snappy-java-1.0.5.jar with timestamp 1549613469969
19/02/08 09:11:09 INFO SparkContext: Added JAR file:/afs/cern.ch/user/l/llayer/.ivy2/jars/org.apache.commons_commons-compress-1.4.1.jar at spark://188.184.30.74:5001/jars/org.apache.commons_commons-compress-1.4.1.jar with timestamp 1549613469970
19/02/08 09:11:09 INFO SparkContext: Added JAR file:/afs/cern.ch/user/l/llayer/.ivy2/jars/org.tukaani_xz-1.0.jar at spark://188.184.30.74:5001/jars/org.tukaani_xz-1.0.jar with timestamp 1549613469970
19/02/08 09:11:10 INFO SparkContext: Added file file:/afs/cern.ch/user/l/llayer/test_wm/WMArchive/src/python/WMArchive/Tools/myspark.py at file:/afs/cern.ch/user/l/llayer/test_wm/WMArchive/src/python/WMArchive/Tools/myspark.py with timestamp 1549613470276
19/02/08 09:11:10 INFO Utils: Copying /afs/cern.ch/user/l/llayer/test_wm/WMArchive/src/python/WMArchive/Tools/myspark.py to /tmp/spark-4dc0c089-9e65-4cf0-a8d4-f54426f5dac2/userFiles-3ab4c76a-4051-491f-8b02-b84f7a31e719/myspark.py
19/02/08 09:11:10 INFO SparkContext: Added file file:/afs/cern.ch/user/l/llayer/.ivy2/jars/com.databricks_spark-avro_2.11-4.0.0.jar at file:/afs/cern.ch/user/l/llayer/.ivy2/jars/com.databricks_spark-avro_2.11-4.0.0.jar with timestamp 1549613470304
19/02/08 09:11:10 INFO Utils: Copying /afs/cern.ch/user/l/llayer/.ivy2/jars/com.databricks_spark-avro_2.11-4.0.0.jar to /tmp/spark-4dc0c089-9e65-4cf0-a8d4-f54426f5dac2/userFiles-3ab4c76a-4051-491f-8b02-b84f7a31e719/com.databricks_spark-avro_2.11-4.0.0.jar
19/02/08 09:11:10 INFO SparkContext: Added file file:/afs/cern.ch/user/l/llayer/.ivy2/jars/org.slf4j_slf4j-api-1.7.5.jar at file:/afs/cern.ch/user/l/llayer/.ivy2/jars/org.slf4j_slf4j-api-1.7.5.jar with timestamp 1549613470313
19/02/08 09:11:10 INFO Utils: Copying /afs/cern.ch/user/l/llayer/.ivy2/jars/org.slf4j_slf4j-api-1.7.5.jar to /tmp/spark-4dc0c089-9e65-4cf0-a8d4-f54426f5dac2/userFiles-3ab4c76a-4051-491f-8b02-b84f7a31e719/org.slf4j_slf4j-api-1.7.5.jar
19/02/08 09:11:10 INFO SparkContext: Added file file:/afs/cern.ch/user/l/llayer/.ivy2/jars/org.apache.avro_avro-1.7.6.jar at file:/afs/cern.ch/user/l/llayer/.ivy2/jars/org.apache.avro_avro-1.7.6.jar with timestamp 1549613470323
19/02/08 09:11:10 INFO Utils: Copying /afs/cern.ch/user/l/llayer/.ivy2/jars/org.apache.avro_avro-1.7.6.jar to /tmp/spark-4dc0c089-9e65-4cf0-a8d4-f54426f5dac2/userFiles-3ab4c76a-4051-491f-8b02-b84f7a31e719/org.apache.avro_avro-1.7.6.jar
19/02/08 09:11:10 INFO SparkContext: Added file file:/afs/cern.ch/user/l/llayer/.ivy2/jars/org.codehaus.jackson_jackson-core-asl-1.9.13.jar at file:/afs/cern.ch/user/l/llayer/.ivy2/jars/org.codehaus.jackson_jackson-core-asl-1.9.13.jar with timestamp 1549613470336
19/02/08 09:11:10 INFO Utils: Copying /afs/cern.ch/user/l/llayer/.ivy2/jars/org.codehaus.jackson_jackson-core-asl-1.9.13.jar to /tmp/spark-4dc0c089-9e65-4cf0-a8d4-f54426f5dac2/userFiles-3ab4c76a-4051-491f-8b02-b84f7a31e719/org.codehaus.jackson_jackson-core-asl-1.9.13.jar
19/02/08 09:11:10 INFO SparkContext: Added file file:/afs/cern.ch/user/l/llayer/.ivy2/jars/org.codehaus.jackson_jackson-mapper-asl-1.9.13.jar at file:/afs/cern.ch/user/l/llayer/.ivy2/jars/org.codehaus.jackson_jackson-mapper-asl-1.9.13.jar with timestamp 1549613470347
19/02/08 09:11:10 INFO Utils: Copying /afs/cern.ch/user/l/llayer/.ivy2/jars/org.codehaus.jackson_jackson-mapper-asl-1.9.13.jar to /tmp/spark-4dc0c089-9e65-4cf0-a8d4-f54426f5dac2/userFiles-3ab4c76a-4051-491f-8b02-b84f7a31e719/org.codehaus.jackson_jackson-mapper-asl-1.9.13.jar
19/02/08 09:11:10 INFO SparkContext: Added file file:/afs/cern.ch/user/l/llayer/.ivy2/jars/com.thoughtworks.paranamer_paranamer-2.3.jar at file:/afs/cern.ch/user/l/llayer/.ivy2/jars/com.thoughtworks.paranamer_paranamer-2.3.jar with timestamp 1549613470366
19/02/08 09:11:10 INFO Utils: Copying /afs/cern.ch/user/l/llayer/.ivy2/jars/com.thoughtworks.paranamer_paranamer-2.3.jar to /tmp/spark-4dc0c089-9e65-4cf0-a8d4-f54426f5dac2/userFiles-3ab4c76a-4051-491f-8b02-b84f7a31e719/com.thoughtworks.paranamer_paranamer-2.3.jar
19/02/08 09:11:10 INFO SparkContext: Added file file:/afs/cern.ch/user/l/llayer/.ivy2/jars/org.xerial.snappy_snappy-java-1.0.5.jar at file:/afs/cern.ch/user/l/llayer/.ivy2/jars/org.xerial.snappy_snappy-java-1.0.5.jar with timestamp 1549613470376
19/02/08 09:11:10 INFO Utils: Copying /afs/cern.ch/user/l/llayer/.ivy2/jars/org.xerial.snappy_snappy-java-1.0.5.jar to /tmp/spark-4dc0c089-9e65-4cf0-a8d4-f54426f5dac2/userFiles-3ab4c76a-4051-491f-8b02-b84f7a31e719/org.xerial.snappy_snappy-java-1.0.5.jar
19/02/08 09:11:10 INFO SparkContext: Added file file:/afs/cern.ch/user/l/llayer/.ivy2/jars/org.apache.commons_commons-compress-1.4.1.jar at file:/afs/cern.ch/user/l/llayer/.ivy2/jars/org.apache.commons_commons-compress-1.4.1.jar with timestamp 1549613470426
19/02/08 09:11:10 INFO Utils: Copying /afs/cern.ch/user/l/llayer/.ivy2/jars/org.apache.commons_commons-compress-1.4.1.jar to /tmp/spark-4dc0c089-9e65-4cf0-a8d4-f54426f5dac2/userFiles-3ab4c76a-4051-491f-8b02-b84f7a31e719/org.apache.commons_commons-compress-1.4.1.jar
19/02/08 09:11:10 INFO SparkContext: Added file file:/afs/cern.ch/user/l/llayer/.ivy2/jars/org.tukaani_xz-1.0.jar at file:/afs/cern.ch/user/l/llayer/.ivy2/jars/org.tukaani_xz-1.0.jar with timestamp 1549613470436
19/02/08 09:11:10 INFO Utils: Copying /afs/cern.ch/user/l/llayer/.ivy2/jars/org.tukaani_xz-1.0.jar to /tmp/spark-4dc0c089-9e65-4cf0-a8d4-f54426f5dac2/userFiles-3ab4c76a-4051-491f-8b02-b84f7a31e719/org.tukaani_xz-1.0.jar
19/02/08 09:11:10 INFO Executor: Starting executor ID driver on host localhost
19/02/08 09:11:10 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 5101.
19/02/08 09:11:10 INFO NettyBlockTransferService: Server created on 188.184.30.74:5101
19/02/08 09:11:10 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
19/02/08 09:11:10 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 188.184.30.74, 5101, None)
19/02/08 09:11:10 INFO BlockManagerMasterEndpoint: Registering block manager 188.184.30.74:5101 with 2004.6 MB RAM, BlockManagerId(driver, 188.184.30.74, 5101, None)
19/02/08 09:11:10 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 188.184.30.74, 5101, None)
19/02/08 09:11:10 INFO BlockManager: external shuffle service port = 7337
19/02/08 09:11:10 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 188.184.30.74, 5101, None)
19/02/08 09:11:10 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@3a8ae0e6{/metrics/json,null,AVAILABLE,@Spark}
19/02/08 09:11:10 WARN DFSUtil: Namenode for hadalytic remains unresolved for ID p01001532067275.cern.ch.  Check your hdfs-site.xml file to ensure namenodes are configured properly.
19/02/08 09:11:11 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
19/02/08 09:11:11 INFO EventLoggingListener: Logging events to hdfs:///user/spark/applicationHistory/local-1549613470488
19/02/08 09:11:11 INFO SparkContext: Added file /afs/cern.ch/user/l/llayer/test_wm/WMArchive/src/python/WMArchive/PySpark/RecordFinder.py at file:/afs/cern.ch/user/l/llayer/test_wm/WMArchive/src/python/WMArchive/PySpark/RecordFinder.py with timestamp 1549613471955
19/02/08 09:11:11 INFO Utils: Copying /afs/cern.ch/user/l/llayer/test_wm/WMArchive/src/python/WMArchive/PySpark/RecordFinder.py to /tmp/spark-4dc0c089-9e65-4cf0-a8d4-f54426f5dac2/userFiles-3ab4c76a-4051-491f-8b02-b84f7a31e719/RecordFinder.py
19/02/08 09:11:14 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
org.apache.avro.AvroTypeException: Found ns12.name12, expecting ns12.name12, missing required field PrepID
        at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292)
        at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
        at org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:130)
        at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:176)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
        at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
        at org.apache.avro.mapreduce.AvroRecordReaderBase.nextKeyValue(AvroRecordReaderBase.java:118)
        at org.apache.avro.mapreduce.AvroKeyRecordReader.nextKeyValue(AvroKeyRecordReader.java:53)
        at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:207)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:389)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
        at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
        at scala.collection.AbstractIterator.to(Iterator.scala:1336)
        at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
        at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
        at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
        at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
        at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1354)
        at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1354)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
19/02/08 09:11:14 ERROR TaskSetManager: Task 0 in stage 1.0 failed 1 times; aborting job
Traceback (most recent call last):
  File "/afs/cern.ch/user/l/llayer/test_wm/WMArchive/src/python/WMArchive/Tools/myspark.py", line 455, in <module>
    main()
  File "/afs/cern.ch/user/l/llayer/test_wm/WMArchive/src/python/WMArchive/Tools/myspark.py", line 415, in main
    results = run(opts.schema, hdir, opts.script, opts.spec, verbose, opts.rout, opts.yarn)
  File "/afs/cern.ch/user/l/llayer/test_wm/WMArchive/src/python/WMArchive/Tools/myspark.py", line 275, in run
    avro_rdd = ctx.union([ctx.newAPIHadoopFile(f, aformat, akey, awrite, aconv, conf=conf) for f in data_path])
  File "/cvmfs/sft.cern.ch/lcg/releases/spark/2.2.1-3206b/x86_64-centos7-gcc62-opt/python/lib/pyspark.zip/pyspark/context.py", line 675, in newAPIHadoopFile
  File "/cvmfs/sft.cern.ch/lcg/releases/spark/2.2.1-3206b/x86_64-centos7-gcc62-opt/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
  File "/cvmfs/sft.cern.ch/lcg/releases/spark/2.2.1-3206b/x86_64-centos7-gcc62-opt/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopFile.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): org.apache.avro.AvroTypeException: Found ns12.name12, expecting ns12.name12, missing required field PrepID
        at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292)
        at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
        at org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:130)
        at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:176)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
        at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
        at org.apache.avro.mapreduce.AvroRecordReaderBase.nextKeyValue(AvroRecordReaderBase.java:118)
        at org.apache.avro.mapreduce.AvroKeyRecordReader.nextKeyValue(AvroKeyRecordReader.java:53)
        at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:207)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:389)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
        at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
        at scala.collection.AbstractIterator.to(Iterator.scala:1336)
        at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
        at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
        at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
        at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
        at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1354)
        at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1354)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1517)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1505)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1504)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1504)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
        at scala.Option.foreach(Option.scala:257)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1732)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1687)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1676)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2029)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2050)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2069)
        at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1354)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
        at org.apache.spark.rdd.RDD.take(RDD.scala:1327)
        at org.apache.spark.api.python.SerDeUtil$.pairRDDToPython(SerDeUtil.scala:203)
        at org.apache.spark.api.python.PythonRDD$.newAPIHadoopFile(PythonRDD.scala:570)
        at org.apache.spark.api.python.PythonRDD.newAPIHadoopFile(PythonRDD.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:280)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.avro.AvroTypeException: Found ns12.name12, expecting ns12.name12, missing required field PrepID
        at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292)
        at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
        at org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:130)
        at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:176)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
        at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
        at org.apache.avro.mapreduce.AvroRecordReaderBase.nextKeyValue(AvroRecordReaderBase.java:118)
        at org.apache.avro.mapreduce.AvroKeyRecordReader.nextKeyValue(AvroKeyRecordReader.java:53)
        at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:207)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:389)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
        at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
        at scala.collection.AbstractIterator.to(Iterator.scala:1336)
        at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
        at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
        at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
        at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
        at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1354)
        at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1354)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        ... 1 more
vkuznet commented 5 years ago

Lukas, sorry for delay in reply, I was traveling and sick. I can reproduce this problem, but since the task name and data range you used is quite old I bet it is issue with a schema of data which has been migrated. Can you try another, new-ish, task name and data range?

vlimant commented 5 years ago

Hi @vkuznet, do you have a bit of bandwidth to help Lukas forward ?

vkuznet commented 5 years ago

Jean-Roch, I can help Lukas once he will explain exactly what needs to be done. So far I suggested to try out new-ish workflow since it seems to me that old one have issue with schema (we introduced PrepID at some point and it seems to me that task/date-range Lukas is using haven't had the PrepID in WMArchive). Valentin.

On 0, vlimant notifications@github.com wrote:

Hi @vkuznet, do you have a bit of bandwidth to help Lukas forward ?

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/dmwm/WMArchive/issues/351#issuecomment-465171810

vlimant commented 5 years ago

makes sense, if prepid is explicitely requested, or searched about. which is it @llayer ? do you retreive the prepid explicitely, or is the query containing prepid ?

vkuznet commented 5 years ago

Jean-Roch, it is another way around. Originally stored docs haven't had PrepID and WMArchive schema neither. Later we introduced PrepID in schema and current schema requires it. That's why we should either search for new-ish task which has PrepID in it or we need to look-up back original schema and use it in queries. For simplicity I asked can we look-up new-ish task/workflow.

On 0, vlimant notifications@github.com wrote:

makes sense, if prepid is explicitely requested, or searched about. which is it @llayer ? do you retreive the prepid explicitely, or is the query containing prepid ?

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/dmwm/WMArchive/issues/351#issuecomment-465185121

llayer commented 5 years ago

Hi Valentin, Jean-Roch,

I finally managed to test and run a new task for a more recent production, e.g. I made a spec to find a lfn file that i recently used in my analysis:

"spec":{"lfn":"/store/mc/RunIISummer16NanoAODv4/TT_TuneCUETP8M2T4_13TeV-powheg-pythia8/NANOAODSIM/PUMoriond17_Nano14Dec2018_102X_mcRun2_asymptotic_v6-v1/90000/FCA7190F-5031-DA4F-8961-D0351E0D3FEB.root","timerange":[20190129,20190131]}, "fields":[]} 

This runs and I get an output file as described in the wiki. Next I tried to follow the instructions to figure out the log files, as described here: https://github.com/dmwm/WMArchive/wiki/How-to-find-logArchive-logCollect-location-for-given-LFN and https://github.com/dmwm/WMArchive/wiki/Steps-for-Querying-LogCollect-file-Location

I tried to lookup the log file for one unmerged task that was returned, i.e. I made a spec:

{"spec":{"lfn":"/store/unmerged/RunIISummer16NanoAODv4/TT_TuneCUETP8M2T4_13TeV-powheg-pythia8/NANOAODSIM/PUMoriond17_Nano14Dec2018_102X_mcRun2_asymptotic_v6-v1/90000/121C44D2-DD1F-3F4D-BCE8-A518079C2E49.root","timerange":[20190129,20190131]}, "fields":[]}

and then run

myspark --script=LogFinder --spec=cond_tt_unmerged.spec --yarn

This returned:

### number of results 1                                                         
{"fields": [], "spec": {"query": ["/store/unmerged/logs/prod/2019/1/30/pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576/TOP-RunIISummer16NanoAODv4-00007_0/0000/0/b7b05f7d-f2d9-47ef-9465-06743ff2e9b1-305-0-logArchive.tar.gz"], "timerange": [20190129, 20190131]}, "queries": ["/store/unmerged/logs/prod/2019/1/30/pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576/TOP-RunIISummer16NanoAODv4-00007_0/0000/0/b7b05f7d-f2d9-47ef-9465-06743ff2e9b1-305-0-logArchive.tar.gz"]}

So finally this should be the location of the logfile and I should be able to retrieve it with xrtcp, however this last step now fails for me:

xrdcp root://cms-xrd-global.cern.ch//store/unmerged/logs/prod/2019/1/30/pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576/TOP-RunIISummer16NanoAODv4-00007_0/0000/0/b7b05f7d-f2d9-47ef-9465-06743ff2e9b1-305-0-logArchive.tar.gz
[0B/0B][100%][==================================================][0B/s]  
Run: [ERROR] Server responded with an error: [3011] No servers are available to read the file.

Do you have any idea what might be the issue here? Many, many thanks in advance!

vkuznet commented 5 years ago

Lukas, I'm glad that WMArchive procedure to find records is working now, but I doubt I can help here with xrdcp problem. The storage of log files is beyond WMArchive responsibility. You may ask Alan (@amaltaro ) what is a policy to keep log archive files, but my understanding that they may be available on EOS.

llayer commented 5 years ago

Thank you Valentin, I will contact him!

llayer commented 5 years ago

Dear Alan @amaltaro,

for my service task I need to access and download a large number of log files. I finally managed to locate the files with the WMArchive, but now I have an issue to access the files with xrdcp:

xrdcp root://cms-xrd-global.cern.ch//store/unmerged/logs/prod/2019/1/30/pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576/TOP-RunIISummer16NanoAODv4-00007_0/0000/0/b7b05f7d-f2d9-47ef-9465-06743ff2e9b1-305-0-logArchive.tar.gz
[0B/0B][100%][==================================================][0B/s]  
Run: [ERROR] Server responded with an error: [3011] No servers are available to read the file.

Is there an easy way to download the logs once I have the location?

Many thanks!!

amaltaro commented 5 years ago

Hi Lucas, yes, there is an easy to download them within ~2 months of lifetime, it's through EOS over HTTP, problem is that we didn't have enough quota until beginning of Feb and all attempts to upload logs there failed since beginning of Jan.

Second option would be to fetch these from the site the workflow is running at. However, those are transient files and they are collected by a LogCollect job, which makes a bigger tarball of logArchives and transfer it to CERN (both CASTOR and EOS).

Does this script find which logCollect tarball contains your unmerged logArchive?

llayer commented 5 years ago

Hi Alan,

thanks for your prompt reply. If I understand correctly there is a procedure described in https://github.com/dmwm/WMArchive/wiki/Steps-for-Querying-LogCollect-file-Location to obtain the logArchive.tar.gz for the unmerged jobs of a merged job. So I assumed that a query like

{"fields": [], "spec": {"query": ["/store/unmerged/logs/prod/2019/1/30/pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576/TOP-RunIISummer16NanoAODv4-00007_0/0000/0/b7b05f7d-f2d9-47ef-9465-06743ff2e9b1-305-0-logArchive.tar.gz"], "timerange": [20190129, 20190131]}, "queries": ["/store/unmerged/logs/prod/2019/1/30/pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576/TOP-RunIISummer16NanoAODv4-00007_0/0000/0/b7b05f7d-f2d9-47ef-9465-06743ff2e9b1-305-0-logArchive.tar.gz"]}

gives me the tarball of the unmerged log Archive, but I don't see how I can access this on EOS?

llayer commented 5 years ago

Hi Alan @amaltaro,

Do you have any quick solution to download the logArchive.tar.gz as described in the post above? I tried a few things, but I was not able to find a working solution. Having at least some log files downloaded would help me a lot to continue my work.

Thank you so much!

amaltaro commented 5 years ago

Lucas, you can find these LogCollect tarballs available in both EOS and CASTOR CERN storage:

amaltaro@lxplus029:~/tmp $ eoscms ls /store/logs/prod/2019/01/WMAgent/pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576/
pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576-LogCollectForTOP-RunIISummer16NanoAODv4-00007_0-cmsgli-4624449-0-cmswn2424-2-logs.tar
pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576-LogCollectForTOP-RunIISummer16NanoAODv4-00007_0-cmsgli-4658037-0-cmswn2330-2-logs.tar
pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576-LogCollectForTOP-RunIISummer16NanoAODv4-00007_0-cmsgli-5015108-0-cmswn2406-2-logs.tar
pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576-LogCollectForTOP-RunIISummer16NanoAODv4-00007_0-node097-1-logs.tar
pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576-LogCollectForTOP-RunIISummer16NanoAODv4-00007_0-node101-1-logs.tar
pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576-LogCollectForTOP-RunIISummer16NanoAODv4-00007_0-node101-3-logs.tar
pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576-LogCollectForTOP-RunIISummer16NanoAODv4-00007_0-node101-4-logs.tar
pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576-LogCollectForTOP-RunIISummer16NanoAODv4-00007_0-node106-4-logs.tar
pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576-LogCollectForTOP-RunIISummer16NanoAODv4-00007_0-node109-1-logs.tar
pdmvserv_task_TOP-RunIISummer16NanoAODv4-00007__v1_T_190129_142236_8576-LogCollectForTOP-RunIISummer16NanoAODv4-00007_0-node122-3-logs.tar

HTH. Alan.

llayer commented 5 years ago

Thank you Alan,

I finally managed to download the first files!

llayer commented 5 years ago

Dear @amaltaro,

for my project with @vlimant in the last weeks I collected the logcollect paths for all failing logs that I need. However while downloading them from cmseos I got the impression that logs older than ~ 02/2018 are deleted, e.g. as a concrete example I cannot find

eos ls /store/logs/prod/2017/11/WMAgent/pdmvserv_task_EGM-RunIIFall17GS-00010__v1_T_171013_161526_3751/pdmvserv_task_EGM-RunIIFall17GS-00010__v1_T_171013_161526_3751-LogCollectForEGM-RunIIFall17GS-00010_0-vmp365-1326-logs.tar
Unable to stat /eos/cms/store/logs/prod/2017/11/WMAgent/pdmvserv_task_EGM-RunIIFall17GS-00010__v1_T_171013_161526_3751/pdmvserv_task_EGM-RunIIFall17GS-00010__v1_T_171013_161526_3751-LogCollectForEGM-RunIIFall17GS-00010_0-vmp365-1326-logs.tar; No such file or directory (errc=2) (No such file or directory)

and it seems that at least most folders in /store/logs/prod/2017/11/WMAgent/ are empty. Can you confirm that the logs are deleted at some point and do you know whether they are archived somewhere else?

Many, many thanks in advance!

amaltaro commented 5 years ago

Hi @llayer , EOS is supposed to keep logs for only a couple of months, so consider yourself lucky for finding more than a year logs in there. You need to access the archival storage in castor, you can use the same path as the one provided for EOS, but use /castor/cern.ch/cms/store/logs/prod/ instead (with xrdcp for instance).

llayer commented 5 years ago

Hi @amaltaro,

many thanks, I am now able to locate the files on castor, but copying with xrdcp returns me an error:

xrdcp root://castorcms.cern.ch//castor/cern.ch/cms/store/logs/prod/2017/11/WMAgent/pdmvserv_task_EGM-RunIIFall17GS-00010__v1_T_171013_161526_3751/pdmvserv_task_EGM-RunIIFall17GS-00010__v1_T_171013_161526_3751-LogCollectForEGM-RunIIFall17GS-00010_0-vmp365-1326-logs.tar . -OSsvcClass=OSsvcClass
[0B/0B][100%][==================================================][0B/s]  
Run: [ERROR] Server responded with an error: [3005] Unable to do async GET request. Insufficient privileges for user 76093,1399 performing a StageGetRequest request on svcClass 'OSsvcClass'; Communication error on send

Do I need some special permissions? I also cannot make a stager_qry:

stager_qry -M /castor/cern.ch/cms/store/logs/prod/2017/11/WMAgent/pdmvserv_task_EGM-RunIIFall17GS-00010__v1_T_171013_161526_3751/pdmvserv_task_EGM-RunIIFall17GS-00010__v1_T_171013_161526_3751-LogCollectForEGM-RunIIFall17GS-00010_0-vmp365-1326-logs.tar .
Error: Permission denied
stage_filequery: Insufficient privileges for user 76093,1399 performing a StageFileQueryRequest request on svcClass ''

I am not familiar with castor, so if you could give me another hint it would be fantastic! Best wishes and many thanks again,

Cheers, Lukas

amaltaro commented 5 years ago

Have you tried without the service class option? I haven't used castor for a long time now, so you might need to ask CERN IT via a snow ticket.