Open saholo21 opened 1 year ago
Do you have a yaml example of the workload you're running?
No, I don't have access to jobs batch yaml. Is there any possibility of running the kubectl dds only for certain types of workloads? i.e., only for deployments, then do it only for statefulset and so on, to avoid the jobs batch scanning error
That might be difficult to implement because the way it works is it scans all pods and then looks for the parent of the pod. It doesn't have a way to start with deployments and work their way down to the pods.
If I implemented this what types of flags would you want? --scan-resource=deployment
or --skip=job
It would get complicated to add both options but I would need something that could be the default behavior eg --scan-type=all
but either way I still have to scan all pods in the cluster and inspect what owns them.
Understood, the type of flag that would fit the best for this case would be --skip=job
, because that's the only workload with which I'm facing issues. However, do you know what could be happening? I mean, there are some running jobs but then they finish during the scan as they meant to do, but the plugin detects this as an error, Is that an expected behavior? Thanks for answering
I'm not too sure what would be causing it without being able to replicate the problem or seeing the job spec with something like kubectl get job job1 --output yaml
What version of Kubernetes are you using?
I was able to get one of the job workloads that's throwing the error. I am using Kubernetes 1.23 version. Let me know if that helps.
apiVersion: batch/v1
kind: Job
metadata:
creationTimestamp: "2023-09-05T11:55:32Z"
generation: 1
labels:
controller-uid: 80fef74c-a01f-4059-b345-d9238c974bec
job-name: populate-analytic-data-aws-28231914
name: populate-analytic-data-aws-28231914
namespace: default
ownerReferences:
- apiVersion: batch/v1
blockOwnerDeletion: true
controller: true
kind: CronJob
name: populate-analytic-data-aws
uid: 4bb57997-3256-4197-b36d-3172c50732a8
resourceVersion: "1177585793"
uid: 80fef74c-a01f-4059-b345-d9238c974bec
spec:
activeDeadlineSeconds: 10000
backoffLimit: 3
completionMode: NonIndexed
completions: 1
parallelism: 1
selector:
matchLabels:
controller-uid: 80fef74c-a01f-4059-b345-d9238c974bec
suspend: false
template:
metadata:
creationTimestamp: null
labels:
controller-uid: 80fef74c-a01f-4059-b345-d9238c974bec
job-name: populate-analytic-data-aws-28231914
spec:
containers:
- args:
- --botName
- populate-analytic-data
- --cassandra
- cassandra-traffic-04.internal.company.com,cassandra-traffic-02.internal.company.com,cassandra-traffic-03.internal.company.com
- --keyspace
- traffic
- --threads
- "4"
- --env
- staging
env:
- name: ENV
value: staging
- name: log_level
value: DEBUG
image: 111111111111.dkr.ecr.us-east-1.amazonaws.com/populate-analytic-data:4.53-reporting
imagePullPolicy: IfNotPresent
name: docker
resources:
limits:
cpu: 450m
memory: 2000Mi
requests:
cpu: 250m
memory: 1400Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
conditions:
- lastProbeTime: "2023-09-05T14:42:12Z"
lastTransitionTime: "2023-09-05T14:42:12Z"
message: Job was active longer than specified deadline
reason: DeadlineExceeded
status: "True"
type: Failed
failed: 1
startTime: "2023-09-05T11:55:32Z"
Hi @rothgar is there any update about this?
Thank you for the example. I'm sorry I haven't been able to test this yet. I'm preparing for some work travel and conference talks and other priorities at work.
Hi @rothgar. Just a quick question to confirm something, if the error message only shows some jobs and the final warning says "The following table may be incomplete due to errors detected during the run" means that the result may be incomplete because only the jobs were not scanned and it is not known if they have a docker.sock mount or because this error with the jobs could have stopped the missing scans of other workloads (deployments, daemonsets, statefulset, etc)?
It should continue with other jobs and workload types. It doesn't exit the app. It appends the error and continues. https://github.com/aws-containers/kubectl-detector-for-docker-socket/blob/main/main.go#L270-L273
I am trying to scan a cluster that has different kinds of workloads (deployments, pods, statefulsets, batch jobs, etc). However, when the scan finishes, I always get the same error: "jobs.batch not found. The following table may be incomplete due to errors detected during the run." The table only returns a single row, analyzing the kube-system namespace, but not all the other workloads, which amount to more than 300. I believe this issue arises because when the scan starts, there are some jobs running but then they finish during the scan (as they are meant to do). However, the plugin interprets this as an issue and throws an error. Is there any workaround for this problem?
Input =
kubectl dds
Output =
error: [jobs.batch "job1" not found, jobs.batch "job2" not found, jobs.batch "job3" not found, jobs.batch "job4" not found]
Warning: The following table may be incomplete due to errors detected during the run
NAMESPACE TYPE NAME STATUS
kube-system daemonset aws-node mounted