Closed gizas closed 1 year ago
the below badges are clickable and redirect to their specific view in the CI or DOCS
#### Build stats * Start Time: 2023-05-03T00:34:49.104+0000 * Duration: 3 min 42 sec #### Test stats :test_tube: | Test | Results | | ------------ | :-----------------------------: | | Failed | 0 | | Passed | 85 | | Skipped | 0 | | Total | 85 |
To re-run your PR in the CI, just comment with: - `/test` : Re-trigger the build.
Pod and container templates have been created:
For container datastream:
./elastic-integration-corpus-generator-tool generate-with-template ./assets/templates/kubernetes.container/gotext.tpl ./assets/templates/kubernetes.container/fields.yml -c ./assets/templates/kubernetes.container/configs.yml -y gotext -t 1000
For pod:
./elastic-integration-corpus-generator-tool generate-with-template ./assets/templates/kubernetes.pod/gotext.tpl ./assets/templates/kubernetes.pod/fields.yml -c ./assets/templates/kubernetes.pod/configs.yml -y gotext -t 1000
/test
@gizas all good here?
@gizas all good here?
Yes my tests are ok and I have managed to produce the wanted outcome. I will open this pr once I fully test it with rally track. I need to find time to run a full test and will come back to you
@aspacca , @martijnvg FYI: https://github.com/elastic/observability-dev/blob/generatortool/docs/infraobs/cloudnative-monitoring/dev-docs/elastic-generator-tool-with-rally.md
I have created the needed corpus and 55G files takes less than couple of minutes !!!! Lets see it in action
@gizas we merged in master a change in the format of the fields generation config file from
- name: cloud.availabilit_zone
value: "europe-west1-d"
- name: agent.id
value: "12f376ef-5186-4e8b-a175-70f1140a8f30"
to
fields:
- name: cloud.availabilit_zone
value: "europe-west1-d"
- name: agent.id
value: "12f376ef-5186-4e8b-a175-70f1140a8f30"
we'll later introduce something like
formatter:
- strip_newlines
fields:
so that you won't need to duplicate anymore the templates, like you do in this PR, and just have a single one "pretty-printed" that will be emitted with newlines stripped
please, feel free to suggest how the formatter
"concept" should look like, thanks
👍
please, let's align on what's currently available in elastic-package
: https://github.com/elastic/elastic-package/blob/main/docs/howto/generate_corpus.md#generate-a-rally-track-for-a-package-dataset-and-run-a-rally-benchmark
Next steps is to try to create the timestamps by using range (or cardinality) functions and spread them to multiple hours
I have addressed this issue by manually produsing timestamps
Also relevant PR introduces the period flag that can be used in the future https://github.com/elastic/elastic-integration-corpus-generator-tool/commit/62a8465ba70aab20191b25dca6e6d797a7ab60fe
The generator tool produces output in multiple lines. The rally tool needs each entry in one line. We have created onliner versions of templates in this PR
Also the elastic-package-benchmark-generate-corpus command will output results in one line per doc entry
Generation of Rally Templates (as part of generatol tool)
Is not needed anymore, as I have tested the elastic-package dump installed-objects --package kubernetes
command that can extract index templates. Relevant instructons added inside the Readme of TSDB2 rally track
- Timestamps of generated data are spread within 1 hour from the time of the triggering the tool. Although generated data can be used for indexing, for the needs of visualisations return empty responses if queries are for time window larger than 1h. Next steps is to try to create the timestamps by using range (or cardinality) functions and spread them to multiple hours
this should be addressed by #95 :)
- The generator tool produces output in multiple lines. The rally tool needs each entry in one line. For now this issue is minor as we provide our templates as one liners: https://github.com/elastic/elastic-integration-corpus-generator-tool/pull/88/files#diff-44eae17c43b58d9d956a9c89b53eed2aa72ef46ad7e10400f26a0a61cd22ccfcR17.
- Also elastic-package tool dump command that will use the generator tool will dump the data in one line! Needs to be tested
yes, elastic-package does it. still elastick-package use v0.5.0 of the corpus generator, so the assets in this PR won't be compatible for the moment. see next point comment for further details
- For every Rally run we need to generate the corpus data (eg. https://github.com/elastic/rally-tracks/pull/373/files#diff-dbafff74aad306950d4c38f30c7612f06cae89395c58311d54ca26a2c374fc03R52) and the mappings of the indices we test (eg. https://github.com/elastic/rally-tracks/pull/373/files#diff-0b2bc88dee0704c8bae38dbe5719417945216348c62273f09940d1afcb7a7eea). The need of corpus generation is matched here, but we still dont have a fully automated way to generate the mapping templates based on the relevant package version we test every time. We make the assumption that mappings dont change so often and can be extracted from any given cluster but still manual process is needed. Issue is not a blocker
similarly to what's done in this elastic-package PR, we'll add a benchmark rally
command that will install the relevant assets of the local package. this will come with releasing v0.6.0 of the corpus generator and upgrading the dependency in elastic-package
I have created the initial template for kubernetes pod datastream. This tempate produces data ready to be ingested from rally tracks. See below the generated file comparing to real data extracted from cluster:
GENERATED-gotext.json.txt
Initial_from_realPOD.json.txt
Command to run:
Features to implement:
Findings after testing:
Timestamps of generated data are spread within 1 hour from the time of the triggering the tool. Although generated data can be used for indexing, for the needs of visualisations return empty responses if queries are for time window larger than 1h. Next steps is to try to create the timestamps by using range (or cardinality) functions and spread them to multiple hours
The generator tool produces output in multiple lines. The rally tool needs each entry in one line. For now this issue is minor as we provide our templates as one liners: https://github.com/elastic/elastic-integration-corpus-generator-tool/pull/88/files#diff-44eae17c43b58d9d956a9c89b53eed2aa72ef46ad7e10400f26a0a61cd22ccfcR17.
For every Rally run we need to generate the corpus data (eg. https://github.com/elastic/rally-tracks/pull/373/files#diff-dbafff74aad306950d4c38f30c7612f06cae89395c58311d54ca26a2c374fc03R52) and the mappings of the indices we test (eg. https://github.com/elastic/rally-tracks/pull/373/files#diff-0b2bc88dee0704c8bae38dbe5719417945216348c62273f09940d1afcb7a7eea). The need of corpus generation is matched here, but we still dont have a fully automated way to generate the mapping templates based on the relevant package version we test every time. We make the assumption that mappings dont change so often and can be extracted from any given cluster but still manual process is needed. Issue is not a blocker