CFT Scorecard is slow for large inventory files

hussainak commented 3 years ago

Hi there,

We are running cft scorecard against a large resource inventory file and it takes almost an hour to run against policy library and output the scorecard. Can this be improved? Resource Inventory file is about 500MB.

morgante commented 3 years ago

This could potentially be improved by running parallel processes, but such improvement is not currently a priority.

As a workaround, you can split your inventory file into multiple files and run CFT scorecard separately on each.

morgante commented 3 years ago

Implementing this would require working across a few files, but the main logic is here: https://github.com/GoogleCloudPlatform/cloud-foundation-toolkit/blob/master/cli/scorecard/violations.go#L103

hussainak commented 3 years ago

did you guys profile this application earlier?

hussainak commented 3 years ago

Hello again Morgante,

I have profiled this application and looks like majority of the time is being spent as follows, it will be easier for you to pinpoint where the issue is and I can attempt looking at it:

Flat | Flat% | Sum% | Cum | Cum% | Name | Inlined? -- | -- | -- | -- | -- | -- | -- 1877.94s | 41.44% | 41.44% | 2250.69s | 49.66% | runtime.scanobject | 1358.01s | 29.96% | 71.40% | 1358.01s | 29.96% | runtime.memclrNoHeapPointers | 133.37s | 2.94% | 74.34% | 165.85s | 3.66% | runtime.findObject | 99.27s | 2.19% | 76.53% | 99.27s | 2.19% | runtime.memmove | 93.32s | 2.06% | 78.59% | 1530.05s | 33.76% | runtime.mallocgc | 57.91s | 1.28% | 79.87% | 57.91s | 1.28% | runtime.pthread_cond_signal | 57.77s | 1.27% | 81.14% | 57.77s | 1.27% | runtime.heapBits.bits | 54.65s | 1.21% | 82.35% | 56.57s | 1.25% | runtime.pageIndexOf | 37.49s | 0.83% | 83.18% | 43.52s | 0.96% | runtime.heapBitsSetType | 34.73s | 0.77% | 83.94% | 34.75s | 0.77% | runtime.madvise | 33.38s | 0.74% | 84.68% | 33.38s | 0.74% | runtime.markBits.isMarked | (inline) 27.39s | 0.60% | 85.28% | 2263.41s | 49.94% | runtime.gcDrain | 27.29s | 0.60% | 85.89% | 30.36s | 0.67% | encoding/json.(*encodeState).string | 25.57s | 0.56% | 86.45% | 131.85s | 2.91% | runtime.greyobject | 12.64s | 0.28% | 86.73% | 45.42s | 1.00% | encoding/json.checkValid | 11.42s | 0.25% | 86.98% | 56.92s | 1.26% | runtime.sweepone | 8.94s | 0.20% | 87.18% | 31.79s | 0.70% | runtime.(*mspan).sweep | 5.74s | 0.13% | 87.31% | 828.05s | 18.27% | runtime.(*mcentral).cacheSpan | 4.43s | 0.10% | 87.40% | 150.59s | 3.32% | runtime.mapassign_faststr | 4.26s | 0.09% | 87.50% | 1100.55s | 24.28% | runtime.newobject | 3.02s | 0.07% | 87.56% | 89.91s | 1.98% | github.com/open-policy-agent/opa/ast.(*object).insert | 2.99s | 0.07% | 87.63% | 196.67s | 4.34% | github.com/open-policy-agent/opa/ast.ValueToInterface.func1 | 2.88s | 0.06% | 87.69% | 87.11s | 1.92% | github.com/open-policy-agent/opa/topdown.(*bindingsArrayHashmap).Put | 2.69s | 0.06% | 87.75% | 85.31s | 1.88% | runtime.mapassign_fast64 | 2.53s | 0.06% | 87.81% | 197.63s | 4.36% | github.com/open-policy-agent/opa/ast.ValueToInterface | 2.48s | 0.05% | 87.86% | 1168.28s | 25.78% | github.com/open-policy-agent/opa/topdown.(*eval).evalStep | 2.10s | 0.05% | 87.91% | 222.72s | 4.91% | github.com/open-policy-agent/opa/ast.(*Term).Copy | 1.93s | 0.04% | 87.95% | 74.60s | 1.65% | runtime.convTstring | 1.87s | 0.04% | 87.99% | 1189.23s | 26.24% | github.com/open-policy-agent/opa/topdown.(*eval).biunifyValues | 1.85s | 0.04% | 88.03% | 1187.14s | 26.19% | github.com/open-policy-agent/opa/topdown.(*eval).biunify | 1.75s | 0.04% | 88.07% | 1193.54s | 26.34% | github.com/open-policy-agent/opa/topdown.(*eval).evalExpr | 1.72s | 0.04% | 88.11% | 65.80s | 1.45% | runtime.(*mheap).allocSpan | 1.66s | 0.04% | 88.15% | 531.39s | 11.72% | github.com/open-policy-agent/opa/ast.(*object).Iter | 1.56s | 0.03% | 88.18% | 575.85s | 12.71% | github.com/open-policy-agent/opa/topdown.evalFunc.evalOneRule | 1.51s | 0.03% | 88.22% | 37.70s | 0.83% | runtime.convTslice | 1.49s | 0.03% | 88.25% | 103.43s | 2.28% | encoding/json.(*decodeState).objectInterface | 1.46s | 0.03% | 88.28% | 87.66s | 1.93% | runtime.makemap | 1.44s | 0.03% | 88.31% | 227.69s | 5.02% | encoding/json.(*decodeState).object | 1.36s | 0.03% | 88.34% | 145.49s | 3.21% | runtime.growslice | 1.28s | 0.03% | 88.37% | 159.37s | 3.52% | encoding/json.mapEncoder.encode | 1.25s | 0.03% | 88.40% | 37.59s | 0.83% | github.com/open-policy-agent/opa/topdown.(*bindings).plugNamespaced | 1.02s | 0.02% | 88.42% | 61.87s | 1.37% | github.com/open-policy-agent/opa/ast.(*trieTraversalResult).Add | 0.96s | 0.02% | 88.44% | 579.27s | 12.78% | github.com/open-policy-agent/opa/topdown.(*eval).biunifyArraysRec | 0.95s | 0.02% | 88.46% | 263.75s | 5.82% | github.com/golang/protobuf/jsonpb.(*Unmarshaler).unmarshalValue | 0.94s | 0.02% | 88.48% | 112.14s | 2.47% | github.com/open-policy-agent/opa/ast.(*parser).parseExpr | 0.88s | 0.02% | 88.50% | 116.15s | 2.56% | runtime.slicebytetostring | 0.86s | 0.02% | 88.52% | 115.77s | 2.55% | github.com/open-policy-agent/opa/ast.(*baseDocEqIndex).Lookup | 0.84s | 0.02% | 88.54% | 124.81s | 2.75% | runtime.makeslice | 0.80s | 0.02% | 88.56% | 24.23s | 0.53% | runtime.schedule | 0.77s | 0.02% | 88.58% | 35.08s | 0.77% | runtime.gcAssistAlloc1 | 0.75s | 0.02% | 88.59% | 832.94s | 18.38% | runtime.(*mcache).refill | 0.69s | 0.02% | 88.61% | 836.23s | 18.45% | runtime.(*mcache).nextFree | 0.69s | 0.02% | 88.62% | 617.45s | 13.62% | github.com/open-policy-agent/opa/topdown.evalFunc.eval | 0.61s | 0.01% | 88.64% | 723.70s | 15.97% | github.com/open-policy-agent/opa/topdown.(*eval).biunifyRef | 0.57s | 0.01% | 88.65% | 62.13s | 1.37% | reflect.Value.MapKeys | 0.57s | 0.01% | 88.66% | 89.29s | 1.97% | github.com/open-policy-agent/opa/ast.newobject | (inline) 0.56s | 0.01% | 88.67% | 2442.04s | 53.88% | runtime.systemstack | 0.49s | 0.01% | 88.68% | 79.97s | 1.76% | runtime.makemap_small | 0.49s | 0.01% | 88.69% | 111.94s | 2.47% | github.com/open-policy-agent/opa/ast.(*parser).parseSeqExpr | 0.49s | 0.01% | 88.71% | 242s | 5.34% | github.com/open-policy-agent/opa/ast.(*object).Map | 0.47s | 0.01% | 88.72% | 229.84s | 5.07% | github.com/open-policy-agent/opa/ast.(*object).Map.func1 | 0.47s | 0.01% | 88.73% | 209.01s | 4.61% | github.com/open-policy-agent/opa/ast.(*object).Copy.func1 | 0.45s | 0.01% | 88.74% | 47.74s | 1.05% | encoding/json.(*RawMessage).UnmarshalJSON | 0.44s | 0.01% | 88.75% | 637.38s | 14.06% | github.com/open-policy-agent/opa/topdown.(*eval).evalCall | 0.44s | 0.01% | 88.76% | 549.89s | 12.13% | github.com/open-policy-agent/opa/topdown.(*eval).biunifyArraysRec.func1 | 0.43s | 0.01% | 88.77% | 26.78s | 0.59% | sort.Slice | 0.42s | 0.01% | 88.77% | 69.66s | 1.54% | github.com/open-policy-agent/opa/ast.(*trieNode).Traverse | 0.41s | 0.01% | 88.78% | 883.72s | 19.50% | github.com/open-policy-agent/opa/topdown.evalTerm.eval | 0.36s | 0.01% | 88.79% | 550.99s | 12.16% | github.com/open-policy-agent/opa/topdown.evalFunc.evalOneRule.func1 | 0.36s | 0.01% | 88.80% | 1073.03s | 23.68% | github.com/open-policy-agent/opa/topdown.(*eval).evalStep.func1 | 0.34s | 0.01% | 88.81% | 87.45s | 1.93% | github.com/open-policy-agent/opa/topdown.(*bindings).bind | 0.34s | 0.01% | 88.81% | 181.70s | 4.01% | encoding/json.(*encodeState).reflectValue | 0.32s | 0.01% | 88.82% | 298.38s | 6.58% | github.com/open-policy-agent/opa/topdown.(*eval).evalNot | 0.31s | 0.01% | 88.83% | 112.23s | 2.48% | github.com/open-policy-agent/opa/ast.(*parser).parseRule | 0.30s | 0.01% | 88.84% | 258.83s | 5.71% | github.com/open-policy-agent/opa/topdown.(*eval).evalWith | 0.30s | 0.01% | 88.84% | 38.13s | 0.84% | github.com/open-policy-agent/opa/topdown.(*bindings).PlugNamespaced | 0.29s | 0.01% | 88.85% | 110.72s | 2.44% | github.com/open-policy-agent/opa/ast.(*parser).parseLabeledExpr | 0.28s | 0.01% | 88.85% | 801.87s | 17.69% | runtime.(*mheap).alloc | 0.28s | 0.01% | 88.86% | 23.25s | 0.51% | reflect.copyVal | 0.28s | 0.01% | 88.87% | 143.59s | 3.17% | github.com/open-policy-agent/opa/topdown.(*eval).getRules | 0.28s | 0.01% | 88.87% | 249.17s | 5.50% | encoding/json.(*decodeState).value | 0.27s | 0.01% | 88.88% | 111.51s | 2.46% | github.com/open-policy-agent/opa/ast.(*parser).parseRuleRefExpr | 0.26s | 0.01% | 88.88% | 27.26s | 0.60% | encoding/json.(*decodeState).literalStore | 0.25s | 0.01% | 88.89% | 806.44s | 17.79% | runtime.(*mcentral).grow | 0.25s | 0.01% | 88.90% | 29.96s | 0.66% | reflect.unsafe_New | 0.25s | 0.01% | 88.90% | 144.22s | 3.18% | github.com/open-policy-agent/opa/ast.termSliceCopy | 0.25s | 0.01% | 88.91% | 26.82s | 0.59% | github.com/open-policy-agent/opa/ast.InterfaceToValue | 0.25s | 0.01% | 88.91% | 110.44s | 2.44% | github.com/open-policy-agent/opa/ast.(*parser).parseChoiceExpr | 0.24s | 0.01% | 88.92% | 708.97s | 15.64% | github.com/open-policy-agent/opa/topdown.evalTerm.next | 0.24s | 0.01% | 88.92% | 1164.13s | 25.69% | github.com/open-policy-agent/opa/topdown.(*eval).evalExpr.func1 | 0.24s | 0.01% | 88.93% | 23.50s | 0.52% | github.com/open-policy-agent/opa/ast.(*parser).incChoiceAltCnt | 0.23s | 0.01% | 88.93% | 386.20s | 8.52% | github.com/open-policy-agent/opa/topdown.evalFunc.evalOneRule.func1.1 | 0.21s | 0.00% | 88.94% | 161.07s | 3.55% | encoding/json.interfaceEncoder | 0.21s | 0.00% | 88.94% | 32.24s | 0.71% | encoding/json.(*decodeState).array | 0.20s | 0.00% | 88.95% | 162.04s | 3.58% | github.com/open-policy-agent/opa/topdown.evalVirtualPartial.evalOneRule | 0.19s | 0.00% | 88.95% | 112.14s | 2.47% | github.com/open-policy-agent/opa/ast.(*parser).parseActionExpr | 0.19s | 0.00% | 88.96% | 31.90s | 0.70% | encoding/json.stringEncoder | 0.18s | 0.00% | 88.96% | 27.30s | 0.60% | runtime.mcall | 0.18s | 0.00% | 88.96% | 38.35s | 0.85% | github.com/open-policy-agent/opa/ast.(*parser).cloneState | 0.18s | 0.00% | 88.97% | 76.49s | 1.69% | github.com/forseti-security/config-validator/pkg/asset.ConvertResourceViaJSONToInterface | 0.17s | 0.00% | 88.97% | 235.27s | 5.19% | github.com/open-policy-agent/opa/topdown.evalTree.next | 0.17s | 0.00% | 88.97% | 32.91s | 0.73% | github.com/open-policy-agent/opa/topdown.(*eval).Resolve | 0.17s | 0.00% | 88.98% | 31.52s | 0.70% | github.com/open-policy-agent/opa/topdown.(*bindings).Plug | (inline) 0.16s | 0.00% | 88.98% | 65.84s | 1.45% | runtime.(*mheap).alloc.func1 | 0.16s | 0.00% | 88.99% | 125.89s | 2.78% | github.com/open-policy-agent/opa/topdown.evalBuiltin.eval | 0.16s | 0.00% | 88.99% | 42.05s | 0.93% | fmt.Sprintf | 0.15s | 0.00% | 88.99% | 35.22s | 0.78% | runtime.gcAssistAlloc | 0.14s | 0.00% | 89.00% | 27.53s | 0.61% | runtime.rawbyteslice | 0.14s | 0.00% | 89.00% | 1183.06s | 26.10% | github.com/open-policy-agent/opa/topdown.(*eval).next | (inline) 0.14s | 0.00% | 89.00% | 43.81s | 0.97% | encoding/json.unquote | (partial-inline) 0.14s | 0.00% | 89.00% | 303.30s | 6.69% | encoding/json.Unmarshal | 0.13s | 0.00% | 89.01% | 28.88s | 0.64% | runtime.stringtoslicebyte | 0.13s | 0.00% | 89.01% | 34.26s | 0.76% | runtime.gcDrainN | 0.13s | 0.00% | 89.01% | 1045.27s | 23.06% | github.com/open-policy-agent/opa/topdown.(*eval).unify | (partial-inline) 0.13s | 0.00% | 89.02% | 55.50s | 1.22% | encoding/json.structEncoder.encode | 0.12s | 0.00% | 89.02% | 105.88s | 2.34% | github.com/open-policy-agent/opa/topdown.(*eval).child | (inline) 0.12s | 0.00% | 89.02% | 218.59s | 4.82% | github.com/open-policy-agent/opa/ast.(*object).Copy | 0.12s | 0.00% | 89.02% | 38.36s | 0.85% | encoding/json.(*decodeState).literalInterface | 0.11s | 0.00% | 89.03% | 23.12s | 0.51% | github.com/open-policy-agent/opa/topdown.(*eval).generateVar | 0.11s | 0.00% | 89.03% | 486.07s | 10.72% | github.com/open-policy-agent/opa/topdown.(*eval).evalStep.func2 | 0.10s | 0.00% | 89.03% | 114.08s | 2.52% | github.com/open-policy-agent/opa/topdown.evalBuiltin.eval.func1 | 0.10s | 0.00% | 89.03% | 530.19s | 11.70% | github.com/open-policy-agent/opa/topdown.(*eval).evalStep.func3 | 0.10s | 0.00% | 89.04% | 54.14s | 1.19% | github.com/open-policy-agent/opa/ast.(*parser).parseZeroOrMoreExpr | 0.10s | 0.00% | 89.04% | 98.87s | 2.18% | encoding/json.(*decodeState).valueInterface | 0.09s | 0.00% | 89.04% | 84.62s | 1.87% | github.com/open-policy-agent/opa/topdown.evalTerm.enumerate | 0.09s | 0.00% | 89.04% | 50.76s | 1.12% | github.com/golang/protobuf/jsonpb.(*Marshaler).marshalValue | 0.09s | 0.00% | 89.04% | 64.11s | 1.41% | encoding/json.ptrEncoder.encode | 0.09s | 0.00% | 89.05% | 249.99s | 5.52% | encoding/json.(*decodeState).unmarshal | 0.08s | 0.00% | 89.05% | 31.21s | 0.69% | reflect.mapiterinit | 0.08s | 0.00% | 89.05% | 189.80s | 4.19% | github.com/open-policy-agent/opa/topdown.mergeTermWithValues | 0.08s | 0.00% | 89.05% | 29.76s | 0.66% | github.com/open-policy-agent/opa/topdown.(*eval).closure | (inline) 0.08s | 0.00% | 89.05% | 69.41s | 1.53% | encoding/json.(*decodeState).arrayInterface | 0.07s | 0.00% | 89.05% | 59.59s | 1.31% | github.com/open-policy-agent/opa/topdown.evalTerm.enumerate.func1 | 0.07s | 0.00% | 89.06% | 197.88s | 4.37% | encoding/json.Marshal | 0.06s | 0.00% | 89.06% | 2263.52s | 49.94% | runtime.gcBgMarkWorker.func2 | 0.06s | 0.00% | 89.06% | 180.98s | 3.99% | github.com/open-policy-agent/opa/topdown.evalVirtualPartial.evalOneRule.func1 | 0.06s | 0.00% | 89.06% | 55.07s | 1.22% | github.com/golang/protobuf/jsonpb.unquote | (inline) 0.05s | 0.00% | 89.06% | 234.68s | 5.18% | github.com/open-policy-agent/opa/topdown.evalTree.eval | 0.05s | 0.00% | 89.06% | 87.64s | 1.93% | github.com/open-policy-agent/opa/topdown.evalTree.enumerate | 0.05s | 0.00% | 89.06% | 83.49s | 1.84% | github.com/open-policy-agent/opa/topdown.(*eval).biunifyComprehensionSet | 0.05s | 0.00% | 89.06% | 171.58s | 3.79% | encoding/json.(*encodeState).marshal | 0.04s | 0.00% | 89.06% | 2274.75s | 50.19% | runtime.gcBgMarkWorker | 0.04s | 0.00% | 89.07% | 166.21s | 3.67% | github.com/open-policy-agent/opa/topdown.evalVirtual.eval | 0.04s | 0.00% | 89.07% | 23.99s | 0.53% | github.com/open-policy-agent/opa/topdown.(*eval).Run | 0.04s | 0.00% | 89.07% | 52.50s | 1.16% | github.com/golang/protobuf/jsonpb.(*Marshaler).marshalObject | 0.04s | 0.00% | 89.07% | 105.56s | 2.33% | encoding/json.sliceEncoder.encode | 0.04s | 0.00% | 89.07% | 105.48s | 2.33% | encoding/json.arrayEncoder.encode | 0.03s | 0.00% | 89.07% | 57.99s | 1.28% | runtime.semawakeup | 0.03s | 0.00% | 89.07% | 32.26s | 0.71% | k8s.io/apimachinery/pkg/apis/meta/v1/unstructured.unstructuredJSONScheme.decodeToUnstructured | 0.03s | 0.00% | 89.07% | 344.49s | 7.60% | github.com/open-policy-agent/opa/topdown.evalVirtualPartial.evalOneRule.func1.1 | 0.03s | 0.00% | 89.07% | 154.27s | 3.40% | github.com/open-policy-agent/opa/topdown.evalVirtualPartial.eval | 0.03s | 0.00% | 89.07% | 80.75s | 1.78% | github.com/open-policy-agent/opa/topdown.evalTree.finish | 0.03s | 0.00% | 89.07% | 74.84s | 1.65% | github.com/open-policy-agent/opa/topdown.evalTree.enumerate.func2.1 | 0.03s | 0.00% | 89.07% | 735.99s | 16.24% | github.com/open-policy-agent/opa/topdown.(*eval).eval | (partial-inline) 0.03s | 0.00% | 89.07% | 532.38s | 11.75% | github.com/open-policy-agent/opa/topdown.(*eval).biunifyArrays | (inline) 0.03s | 0.00% | 89.08% | 198.91s | 4.39% | github.com/open-policy-agent/opa/rego.(*Rego).eval.func2 | 0.03s | 0.00% | 89.08% | 136.76s | 3.02% | github.com/open-policy-agent/opa/ast.Array.Copy | 0.03s | 0.00% | 89.08% | 265.98s | 5.87% | github.com/open-policy-agent/frameworks/constraint/pkg/client/drivers/local.(*driver).Query | 0.02s | 0.00% | 89.08% | 25.94s | 0.57% | runtime.newproc.func1 | 0.02s | 0.00% | 89.08% | 32.52s | 0.72% | k8s.io/apimachinery/pkg/apis/meta/v1/unstructured.unstructuredJSONScheme.Decode | 0.02s | 0.00% | 89.08% | 88.47s | 1.95% | github.com/open-policy-agent/opa/topdown.functionalWrapper2.func1 | 0.02s | 0.00% | 89.08% | 30.66s | 0.68% | github.com/open-policy-agent/opa/topdown.(*eval).resolveReadFromStorage | 0.02s | 0.00% | 89.08% | 99.51s | 2.20% | github.com/open-policy-agent/opa/topdown.(*eval).evalWith.func3 | 0.02s | 0.00% | 89.08% | 207.66s | 4.58% | github.com/open-policy-agent/opa/topdown.(*eval).Run.func1 | 0.02s | 0.00% | 89.08% | 58.39s | 1.29% | github.com/open-policy-agent/opa/rego.(*Rego).prepare | 0.02s | 0.00% | 89.08% | 263.80s | 5.82% | github.com/forseti-security/config-validator/pkg/gcv.(*Validator).reviewGCPResource | 0.02s | 0.00% | 89.08% | 240.48s | 5.31% | github.com/forseti-security/config-validator/pkg/gcv.(*Result).ToViolations | 0.01s | 0.00% | 89.08% | 57.70s | 1.27% | runtime.wakep | 0.01s | 0.00% | 89.08% | 57.82s | 1.28% | runtime.startm | 0.01s | 0.00% | 89.08% | 31.08s | 0.69% | runtime.ready | 0.01s | 0.00% | 89.08% | 56.96s | 1.26% | runtime.bgsweep | 0.01s | 0.00% | 89.08% | 31.78s | 0.70% | k8s.io/apimachinery/pkg/util/json.Unmarshal | 0.01s | 0.00% | 89.08% | 88.78s | 1.96% | github.com/open-policy-agent/opa/topdown.functionalWrapper1.func1 | 0.01s | 0.00% | 89.08% | 350.87s | 7.74% | github.com/open-policy-agent/opa/topdown.evalVirtualPartial.evalTerm | 0.01s | 0.00% | 89.08% | 70.42s | 1.55% | github.com/open-policy-agent/opa/topdown.evalTree.enumerate.func2 | 0.01s | 0.00% | 89.08% | 80.90s | 1.79% | github.com/open-policy-agent/opa/topdown.(*eval).biunifyComprehension | 0.01s | 0.00% | 89.08% | 206.27s | 4.55% | github.com/open-policy-agent/opa/topdown.(*Query).Iter.func1 | 0.01s | 0.00% | 89.08% | 64.75s | 1.43% | github.com/open-policy-agent/opa/rego.(*Rego).Eval | 0.01s | 0.00% | 89.08% | 65.49s | 1.45% | github.com/open-policy-agent/frameworks/constraint/pkg/client/drivers/local.(*driver).eval | 0.01s | 0.00% | 89.08% | 275.81s | 6.09% | github.com/golang/protobuf/jsonpb.(*Unmarshaler).UnmarshalNext | 0.01s | 0.00% | 89.08% | 49.94s | 1.10% | github.com/golang/protobuf/jsonpb.(*Marshaler).Marshal | 0.01s | 0.00% | 89.08% | 240.39s | 5.30% | github.com/forseti-security/config-validator/pkg/gcv.(*ConstraintViolation).toViolation | 0.01s | 0.00% | 89.08% | 60.45s | 1.33% | github.com/GoogleCloudPlatform/cloud-foundation-toolkit/cli/scorecard.uniqueViolations | 0.01s | 0.00% | 89.08% | 63.33s | 1.40% | github.com/GoogleCloudPlatform/cloud-foundation-toolkit/cli/scorecard.protoViaJSON | 0.01s | 0.00% | 89.08% | 658.63s | 14.53% | github.com/GoogleCloudPlatform/cloud-foundation-toolkit/cli/scorecard.getViolations | 0.01s | 0.00% | 89.08% | 68.60s | 1.51% | github.com/GoogleCloudPlatform/cloud-foundation-toolkit/cli/scorecard.getDataFromReader | 0.01s | 0.00% | 89.09% | 68.37s | 1.51% | github.com/GoogleCloudPlatform/cloud-foundation-toolkit/cli/scorecard.getAssetFromJSON | 0.01s | 0.00% | 89.09% | 55.43s | 1.22% | encoding/json.(*Decoder).Decode | 0 | 0.00% | 89.09% | 34.65s | 0.76% | runtime.sysUsed | (inline) 0 | 0.00% | 89.09% | 57.80s | 1.28% | runtime.notewakeup | 0 | 0.00% | 89.09% | 92.57s | 2.04% | runtime.mstart | 0 | 0.00% | 89.09% | 707.40s | 15.61% | runtime.main | 0 | 0.00% | 89.09% | 31.08s | 0.69% | runtime.goready.func1 | 0 | 0.00% | 89.09% | 35.08s | 0.77% | runtime.gcAssistAlloc.func1 | 0 | 0.00% | 89.09% | 713.50s | 15.74% | main.main | 0 | 0.00% | 89.09% | 32.26s | 0.71% | k8s.io/apimachinery/pkg/apis/meta/v1/unstructured.unstructuredJSONScheme.decodeInto | 0 | 0.00% | 89.09% | 32.52s | 0.72% | k8s.io/apimachinery/pkg/apis/meta/v1/unstructured.(*Unstructured).UnmarshalJSON | 0 | 0.00% | 89.09% | 719.83s | 15.88% | github.com/spf13/cobra.(*Command).execute | 0 | 0.00% | 89.09% | 717.96s | 15.84% | github.com/spf13/cobra.(*Command).ExecuteC | 0 | 0.00% | 89.09% | 717.20s | 15.82% | github.com/spf13/cobra.(*Command).Execute | (inline) 0 | 0.00% | 89.09% | 24.28s | 0.54% | github.com/open-policy-agent/opa/util.RoundTrip | 0 | 0.00% | 89.09% | 76.42s | 1.69% | github.com/open-policy-agent/opa/topdown.evalTree.finish.func1 | 0 | 0.00% | 89.09% | 48.71s | 1.07% | github.com/open-policy-agent/opa/topdown.evalTerm.enumerate.func2.1 | 0 | 0.00% | 89.09% | 34.74s | 0.77% | github.com/open-policy-agent/opa/topdown.evalTerm.enumerate.func2 | 0 | 0.00% | 89.09% | 32.51s | 0.72% | github.com/open-policy-agent/opa/rego.(*Rego).parseRawInput | 0 | 0.00% | 89.09% | 32.47s | 0.72% | github.com/open-policy-agent/opa/rego.(*Rego).parseInput | 0 | 0.00% | 89.09% | 58.24s | 1.29% | github.com/open-policy-agent/opa/rego.(*Rego).PrepareForEval | 0 | 0.00% | 89.09% | 197.62s | 4.36% | github.com/open-policy-agent/opa/ast.JSON | (inline) 0 | 0.00% | 89.09% | 267.38s | 5.90% | github.com/open-policy-agent/frameworks/constraint/pkg/client.(*Client).Review | 0 | 0.00% | 89.09% | 218.46s | 4.82% | github.com/golang/protobuf/jsonpb.UnmarshalString | 0 | 0.00% | 89.09% | 276.10s | 6.09% | github.com/golang/protobuf/jsonpb.(*Unmarshaler).Unmarshal | (inline) 0 | 0.00% | 89.09% | 47.33s | 1.04% | github.com/golang/protobuf/jsonpb.(*Marshaler).marshalField | 0 | 0.00% | 89.09% | 266.87s | 5.89% | github.com/forseti-security/config-validator/pkg/gcv.(*Validator).ReviewUnmarshalledJSON | 0 | 0.00% | 89.09% | 590.29s | 13.02% | github.com/forseti-security/config-validator/pkg/gcv.(*Validator).ReviewAsset | 0 | 0.00% | 89.09% | 67.92s | 1.50% | github.com/GoogleCloudPlatform/cloud-foundation-toolkit/cli/scorecard.unMarshallAsset | 0 | 0.00% | 89.09% | 724.64s | 15.99% | github.com/GoogleCloudPlatform/cloud-foundation-toolkit/cli/scorecard.glob..func2 | 0 | 0.00% | 89.09% | 68.50s | 1.51% | github.com/GoogleCloudPlatform/cloud-foundation-toolkit/cli/scorecard.getDataFromFile | 0 | 0.00% | 89.09% | 60.96s | 1.35% | github.com/GoogleCloudPlatform/cloud-foundation-toolkit/cli/scorecard.(*ScoringConfig).attachViolations | 0 | 0.00% | 89.09% | 718.67s | 15.86% | github.com/GoogleCloudPlatform/cloud-foundation-toolkit/cli/scorecard.(*InventoryConfig).findViolations | 0 | 0.00% | 89.09% | 725.79s | 16.01% | github.com/GoogleCloudPlatform/cloud-foundation-toolkit/cli/scorecard.(*InventoryConfig).Score | 0 | 0.00% | 89.09% | 717.20s | 15.82% | github.com/GoogleCloudPlatform/cloud-foundation-toolkit/cli/cmd.Execute |

hussainak commented 3 years ago

Attaching CPU profile:

graph.pdf

cpu.prof.zip

hussainak commented 3 years ago

Hi @morgante

As the first step I am trying to use the newline format json parser and looks like I am hitting a bug, are you able to advise on the following code:

func getDataFromReaderNew(reader io.Reader) ([]*validator.Asset, error) {
    var pbAssets []*validator.Asset
    decode := json.NewDecoder(reader)

    for {
        pbAsset := &validator.Asset{}
        if err := decode.Decode(&pbAsset); err == io.EOF {
            break // done decoding file
        } else if err != nil {
            // handle error
        }
        //pbAsset, err := getAssetFromJSONNew(pbAsset)
        //if err != nil {
        //  return nil, err
        //}
        //phew, err1 := cvasset.ConvertResourceViaJSONToInterface(pbAsset)
        //if err1 != nil {
        //  return nil, errors.Wrapf(err1, "fetching ancestry path for %s", pbAsset)
        //}
        //fmt.Println(phew)
        if pbAsset.OrgPolicy != nil {
            if pbAsset.OrgPolicy != nil {
                orgPolicy := pbAsset.GetOrgPolicy()
                for _, v := range orgPolicy {
                    v.UpdateTime = nil
                }
            }
        }

        jsn, err := json.Marshal(pbAsset)
        if err != nil {
            return nil,  errors.Wrap(err, "marshaling to json")
        }
        umar := &jsonpb.Unmarshaler{AllowUnknownFields: true}
        pbAsset1 := &validator.Asset{}
        if err := umar.Unmarshal(strings.NewReader(string(jsn)), pbAsset1); err != nil {
            return nil, errors.Wrap(err, "unmarshaling to proto")
        }

        err = cvasset.SanitizeAncestryPath(pbAsset1)
        if err != nil {
            return nil, errors.Wrapf(err, "fetching ancestry path for %s", pbAsset1)
        }

        //Log.Debug("Asset converted", "name", pbAsset1.GetName(), "ancestry", pbAsset1.GetAncestryPath())
        //phew1, err2 := cvasset.ConvertResourceViaJSONToInterface(pbAsset1)
        //if err2 != nil {
        //  return nil, errors.Wrapf(err2, "fetching ancestry path for %s", pbAsset1)
        //}
        //fmt.Println(phew1)

        //vlidate := cvasset.ValidateAsset(pbAsset1)

        //fmt.Println(vlidate)

        pbAssets = append(pbAssets, pbAsset1)
    }
    return pbAssets, nil
}

The error is:

 GCP target Constraint Framework review call failed: validation.gcp.forsetisecurity.org: __modset_templates["validation.gcp.forsetisecurity.org"]["GCPIAMRestrictServiceAccountKeyAgeConstraintV1"]_idx_0:39: eval_builtin_error: time.parse_rfc3339_ns: parsing time "1900-01-01T01:00:006Z" as "2006-01-02T15:04:05Z07:00": cannot parse "6Z" as "Z07:00"

morgante commented 3 years ago

Hey @hussainak,

Thanks for working on this! You actually shouldn't need to do anything special in the getDataFromReader function—the default scan already splits by lines today.

Instead, I would concentrate on parallelizing what happens after an asset has been scanned in. Today, it builds an array of all assets. You probably want to look at instead sending assets into a channel after they are scanned (where parallelized routines can scan them for violations).

hussainak commented 3 years ago

@morgante I have looked at whats consuming the most time and looks like it was the second part of getViolations() method that is building RichViolations. I changed it to use ParallelValidator but looks like its performing the worst and also the ParallelValidator will not allow to add the asset to RichViolation. Changes that I made and tested are below:

//score.go
// ScoringConfig holds settings for generating a score
type ScoringConfig struct {
    categories  map[string]*constraintCategory   // available constraint categories
    constraints map[string]*constraintViolations // a map of constraints violated and their violations
    validator   *gcv.ParallelValidator           // the validator instance used for scoring
}

// NewScoringConfigFromValidator creates a scoring engine with a given validator.
func NewScoringConfigFromValidator(v *gcv.ParallelValidator) *ScoringConfig {
    config := &ScoringConfig{}
    config.validator = v
    return config
}

// NewScoringConfig creates a scoring engine for the given policy library
func NewScoringConfig(ctx context.Context, policyPath string) (*ScoringConfig, error) {
    flag.Parse()
    stopChannel := make(chan struct{}, 200)
    //defer close(stopChannel)
    cv, err := gcv.NewValidator([]string{filepath.Join(policyPath, "policies")},
        filepath.Join(policyPath, "lib"))
    if err != nil {
        return nil, err
    }

    v := gcv.NewParallelValidator(stopChannel, cv)

    config := NewScoringConfigFromValidator(v)
    return config, nil
}

//violations.go
// getViolations finds all Config Validator violations for a given Inventory
func getViolations(inventory *InventoryConfig, config *ScoringConfig) ([]*RichViolation, error) {
    var err error
    var pbAssets []*validator.Asset
    start := time.Now()
    fmt.Println("Fetching data: ", start)
    // Code to measure
    if inventory.bucketName != "" {
        pbAssets, err = getDataFromBucket(inventory.bucketName)
        if err != nil {
            return nil, errors.Wrap(err, "Fetching inventory from Bucket")
        }
    } else if inventory.dirPath != "" {
        pbAssets, err = getDataFromFile(inventory.dirPath)
        if err != nil {
            return nil, errors.Wrap(err, "Fetching inventory from local directory")
        }
    } else if inventory.readFromStdin {
        pbAssets, err = getDataFromStdin()
        if err != nil {
            return nil, errors.Wrap(err, "Reading from stdin")
        }
    }

    duration := time.Since(start)
    fmt.Println("Fetched data in: ", duration)

    start = time.Now()
    fmt.Println("Reviewing Violations: ", start)
    richViolations := make([]*RichViolation, 0)
    violations, err := config.validator.Review(context.Background(), &validator.ReviewRequest{
        Assets: pbAssets,
    })
    if err != nil {
        return nil, errors.Wrapf(err, "reviewing asset")
    }
    duration = time.Since(start)
    fmt.Println("Reviewed Violations in: ", duration)
    fmt.Println(violations)
    //for _, violation := range violations.Violations {
    //  richViolation := RichViolation{*violation, "", violation.Resource, violation.Message, violation.Metadata, nil}
    //  richViolations = append(richViolations, &richViolation)
    //}
    return richViolations, nil
}

Normal, validator processes 400MB file in 37 minutes on my OSX machine whereas the ParallelValidator takes longer, some numbers:

//With stopChannel := make(chan struct{})

Generating CFT scorecard
Fetching data:  2021-07-19 21:51:38.843216 +1000 AEST m=+11.227749735
Fetched data in:  2m8.447783655s
Reviewing Violations:  2021-07-19 21:53:47.289137 +1000 AEST m=+139.675554602
Reviewed Violations in:  57m20.78327486s

//With stopChannel := make(chan struct{}, 20)

Generating CFT scorecard
Fetching data:  2021-07-20 11:48:29.249778 +1000 AEST m=+0.637908807
Fetched data in:  1m5.281816316s
Reviewing Violations:  2021-07-20 11:49:34.531355 +1000 AEST m=+65.919745659
Reviewed Violations in:  1h6m12.25064129s

hussainak commented 3 years ago

Hi @morgante

Here's an update, I have made a separate function to process violations in parallel and ran it over a 100MB CAI file. It gets into some sort of race condition (used sync.WaitGroup())for 400MB file and I think the number of routines that are spun up needs to be bounded. So I went with a published module by a third party developer. Let me know how you guys will proceed Here are the results with and without parallel routines:

Generating CFT scorecard - getViolations()
Fetching data:  2021-07-21 16:11:10.303949 +1000 AEST m=+1.313751808
Fetched data in:  22.685511317s
Reviewing Violations:  2021-07-21 16:11:32.989771 +1000 AEST m=+23.999296513
Reviewed Violations in:  6m31.30912049s
Time taken to write results:  752.258045ms

Generating CFT scorecard - getViolationsParallel()
Fetching data:  2021-07-21 17:31:16.386379 +1000 AEST m=+1.312408356
Fetched data in:  21.755561449s
Reviewing Violations:  2021-07-21 17:31:38.141739 +1000 AEST m=+23.068002501
Reviewed Violations in:  1m47.960509658s
Time taken to write results:  1.256456571s

Parallel code snippet

func getViolationsParallel(inventory *InventoryConfig, config *ScoringConfig) ([]*RichViolation, error) {
    var err error
    var pbAssets []*validator.Asset
    start := time.Now()
    fmt.Println("Fetching data: ", start)

    if inventory.bucketName != "" {
        pbAssets, err = getDataFromBucket(inventory.bucketName)
        if err != nil {
            return nil, errors.Wrap(err, "Fetching inventory from Bucket")
        }
    } else if inventory.dirPath != "" {
        pbAssets, err = getDataFromFile(inventory.dirPath)
        if err != nil {
            return nil, errors.Wrap(err, "Fetching inventory from local directory")
        }
    } else if inventory.readFromStdin {
        pbAssets, err = getDataFromStdin()
        if err != nil {
            return nil, errors.Wrap(err, "Reading from stdin")
        }
    }

    duration := time.Since(start)
    fmt.Println("Fetched data in: ", duration)

    start = time.Now()
    fmt.Println("Reviewing Violations: ", start)

    richViolations := make([]*RichViolation, 0)
    wp := workerpool.New(5)
    for _, asset := range pbAssets {
        asset := asset
        wp.Submit(func() {
            violations, err := config.validator.ReviewAsset(context.Background(), asset)

            if err != nil {
                //return nil, errors.Wrapf(err, "reviewing asset %s", asset)
            }
            for _, violation := range violations {
                richViolation := RichViolation{*violation, "", violation.Resource, violation.Message, violation.Metadata, asset}
                richViolations = append(richViolations, &richViolation)
                //richViolations <- richViolation
            }
        })
    }
    wp.StopWait()

    duration = time.Since(start)
    fmt.Println("Reviewed Violations in: ", duration)

    return richViolations, nil
}

For 400MB inventory file, its a very good improvement

Generating CFT scorecard
Fetching data:  2021-07-21 17:39:16.939089 +1000 AEST m=+1.138443251
Fetched data in:  1m22.735131396s
Reviewing Violations:  2021-07-21 17:40:39.673351 +1000 AEST m=+83.873593801
Reviewed Violations in:  13m8.961828329s
Time taken to write results:  7.194955508s

morgante commented 3 years ago

This does look like a big improvement, thanks for working on it! Assuming you're using this workerpool library, I think that is fine.

The next step would be for you to submit a pull request where we can discuss the code details.

hussainak commented 3 years ago

Could I be given permission to push my branch please? remote: Permission to GoogleCloudPlatform/cloud-foundation-toolkit.git denied to hussainak. fatal: unable to access 'https://github.com/GoogleCloudPlatform/cloud-foundation-toolkit.git/': The requested URL returned error: 403

morgante commented 3 years ago

You should make a fork: https://gist.github.com/Chaser324/ce0505fbed06b947d962

On Thu, Jul 22, 2021 at 01:39 hussainak @.***> wrote:

Could I be given permission to push my branch please? remote: Permission to GoogleCloudPlatform/cloud-foundation-toolkit.git denied to hussainak. fatal: unable to access ' https://github.com/GoogleCloudPlatform/cloud-foundation-toolkit.git/': The requested URL returned error: 403

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/GoogleCloudPlatform/cloud-foundation-toolkit/issues/937#issuecomment-884667710, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMNNGJGKTFZ6UYK6DBWFATTY6VH3ANCNFSM46NLM2NQ .

--

Morgante Pell

Anthos Solutions Architect

Config & Policy Automation

@.***

hussainak commented 3 years ago

Hi @morgante

I have created a pull request for this here: https://github.com/GoogleCloudPlatform/cloud-foundation-toolkit/pull/964

hussainak commented 3 years ago

Just a quick note that I have added a numWorkers() method based on number of CPUs available to Go. However I found limiting it to 4 workers give better performance so I have limited it. Performance results below:

4 workers 400MB

time ./cft scorecard --policy-path /Users/hak/Desktop/Development/GCP/gcp-policy-library/policy-library --dir-path /Users/hak/Desktop/Development/GCP/cft-inventory --output-path . --output-format csv --concurrency

Generating CFT scorecard
Fetching data:  2021-07-23 13:56:02.797504 +1000 AEST m=+1.121431290
Fetched data in:  1m21.072152958s
Reviewing Violations:  2021-07-23 13:57:23.968598 +1000 AEST m=+82.193605434
Reviewed Violations in:  11m46.900796727s
Time taken to write results:  6.429203758s
./cft scorecard --policy-path  --dir-path  --output-path . --output-format cs  3055.22s user 746.32s system 449% cpu 14:05.97 total

12 workers 400MB

time ./cft scorecard --policy-path /Users/hak/Desktop/Development/GCP/gcp-policy-library/policy-library --dir-path /Users/hak/Desktop/Development/GCP/cft-inventory --output-path . --output-format csv --concurrency

Generating CFT scorecard
Fetching data:  2021-07-23 13:26:54.641019 +1000 AEST m=+1.221005861
Fetched data in:  1m25.321140218s
Reviewing Violations:  2021-07-23 13:28:19.962972 +1000 AEST m=+86.542165822
Reviewed Violations in:  22m43.458904001s
Time taken to write results:  7.503422589s
./cft scorecard --policy-path  --dir-path  --output-path . --output-format cs  4934.00s user 3166.56s system 529% cpu 25:29.87 total

hussainak commented 3 years ago

Hi @morgante I have pushed the changes again, two things I'd point out are:

You should probably have a new command under cft to query number of cpus and procs available on a system.
One of the test is failing due to bad data I think. Not sure whats going on there.
I have run go mod tidy again, looks like it has removed un-needed modules but it might also have updated the versions.

morgante commented 3 years ago

You should probably have a new command under cft to query number of cpus and procs available on a system.

We could also accept -concurrency 0 to auto-detect the preferred concurrency.

hussainak commented 3 years ago

We have seen a massive (7x) improvement at our company. This ticket can be closed now.

GoogleCloudPlatform / cloud-foundation-toolkit

CFT Scorecard is slow for large inventory files #937

pprof