cloud-bulldozer / orion

The constellation of a hunter
Apache License 2.0
4 stars 8 forks source link

Support for Daemon mode for orion #26

Closed shashank-boyapally closed 1 month ago

shashank-boyapally commented 4 months ago

Type of change

Description

Do not MERGE until fmatch 0.0.6 is released

Firstly, please forgive for changing many things in a single PR. Below are the updates following this PR.

Related Tickets & Documents

Checklist before requesting a review

Testing

shashank-boyapally commented 4 months ago

Hi @paigerube14 @jtaleric just rebased with latest changes.

shashank-boyapally commented 3 months ago

I added pyshorteners to shorten the buildUrl in the display, the requirements.txt file is not updated since I wanted to update it along fmatch 0.0.6 when released. There would be some function changes in fmatch 0.0.6.

jtaleric commented 3 months ago

@shashank-boyapally can you let us know when this is ready for folks to test?

shashank-boyapally commented 3 months ago

Hi @jtaleric, the feature is ready to be tested. I updated the requirements with the latest version of fmatch and pyshorteners. I request kindly to please test and let me know any feedback.

jtaleric commented 3 months ago

Hi @jtaleric, the feature is ready to be tested. I updated the requirements with the latest version of fmatch and pyshorteners. I request kindly to please test and let me know any feedback.

ack - I just saw conflicts...

jtaleric commented 3 months ago

It appears that we expect runs = match.get_uuid_by_metadata(metadata) to return a dict, but it returns a list?

2024-03-29 08:46:45,421 - Orion - DEBUG - file: utils.py - line: 263 - ['ff4e1c2c-6960-4081-bc01-5df2c1e72541', '0e190b62-38ac-4d4d-98f7-baf1032c21f8', '91a7a520-ca19-43b9-9d5f-dca8f3df5518', '29dcba68-af3f-4e71-b6a7-2f0ecfae3977', '8f2f5271-5dff-44d4-b126-1f856e6a8387', 'f4fcc9eb-c6ce-4ac3-9979-63c0c5466302', 'b6af9829-ed2d-4773-bef6-26d106e400a5', 'e88b185a-34eb-4447-be11-54829bec9d39', '7873914e-24c1-4657-8317-0f778df1ec14', 'c525cda2-8712-4ad4-93f1-ec4a5169ce0d', 'f50c2163-7f72-4521-8a79-dbdfc5b1182d', '5ab533f6-0712-460a-9a0c-f132c1909e4c', 'f8a237da-9a19-4421-8a68-b6d10dc85f3a', 'f5ae4cde-e1bc-4e1d-b2da-0f358375988d', '9c6b4e2c-950c-46f4-8631-5123f127a082', 'a6e4a657-73c0-4228-ab17-1e1c177f97b8', '17f93785-c04a-4238-8029-ae5d1e3766d5', 'ea77b881-2d0e-4145-ae9d-ae853c08a07c', '668be9c2-7e6c-4134-8519-086085f54e04', '20d0c41a-be34-49b4-b5e0-e127bc49aa0d', 'ade0d220-3cb1-4902-8c5a-456ccb3f09f8', '0fd65106-dd94-4dd7-9ee3-ac062db2909e', '4d63bb03-b137-44be-bbc5-6690b54d7228', 'aaa28324-67af-45c8-b507-73fbb3063091']
jtaleric commented 3 months ago

I cleaned up my venv and now I get output... however.. New issue.

orion cmd --config config.yaml --hunter-analyze
2024-03-29 08:55:37,186 - Orion - INFO - file: orion.py - line: 54 - ๐Ÿน Starting Orion in command-line mode
2024-03-29 08:55:37,188 - Orion - INFO - file: utils.py - line: 261 - The test aws-416-med-scale-cluster-density-v2 has started
2024-03-29 08:55:37,189 - Matcher - INFO - Executing query against index=perf_scale_ci
2024-03-29 08:55:37,305 - Matcher - INFO - Executing query against index=perf_scale_ci
2024-03-29 08:55:37,352 - Matcher - INFO - Executing query against index=ripsaw-kube-burner*
2024-03-29 08:55:37,398 - Orion - INFO - file: utils.py - line: 123 - Collecting podReadyLatency
2024-03-29 08:55:37,398 - Matcher - INFO - Executing query against index=ripsaw-kube-burner
2024-03-29 08:55:37,433 - Orion - INFO - file: utils.py - line: 123 - Collecting apiserverCPU
2024-03-29 08:55:37,434 - Matcher - INFO - Executing query against index=ripsaw-kube-burner
2024-03-29 08:55:38,170 - Orion - INFO - file: utils.py - line: 123 - Collecting ovnCPU
2024-03-29 08:55:38,170 - Matcher - INFO - Executing query against index=ripsaw-kube-burner
2024-03-29 08:55:39,563 - Orion - INFO - file: utils.py - line: 123 - Collecting etcdCPU
2024-03-29 08:55:39,563 - Matcher - INFO - Executing query against index=ripsaw-kube-burner
2024-03-29 08:55:40,270 - Orion - INFO - file: utils.py - line: 123 - Collecting etcdDisck
2024-03-29 08:55:40,270 - Matcher - INFO - Executing query against index=ripsaw-kube-burner

and with the main branch

orion --config config.yaml --hunter-analyze
2024-03-29 08:57:04,940 - Orion - INFO - The test aws-416-med-scale-cluster-density-v2 has started
2024-03-29 08:57:04,940 - Matcher - INFO - Executing query against index=perf_scale_ci
2024-03-29 08:57:05,080 - Matcher - INFO - Executing query against index=ripsaw-kube-burner*
2024-03-29 08:57:05,118 - Orion - INFO - Collecting podReadyLatency
2024-03-29 08:57:05,119 - Matcher - INFO - Executing query against index=ripsaw-kube-burner
2024-03-29 08:57:05,158 - Orion - INFO - Collecting apiserverCPU
2024-03-29 08:57:05,159 - Matcher - INFO - Executing query against index=ripsaw-kube-burner
2024-03-29 08:57:05,840 - Orion - INFO - Collecting ovnCPU
2024-03-29 08:57:05,841 - Matcher - INFO - Executing query against index=ripsaw-kube-burner
2024-03-29 08:57:07,214 - Orion - INFO - Collecting etcdCPU
2024-03-29 08:57:07,214 - Matcher - INFO - Executing query against index=ripsaw-kube-burner
2024-03-29 08:57:07,953 - Orion - INFO - Collecting etcdDisck
2024-03-29 08:57:07,953 - Matcher - INFO - Executing query against index=ripsaw-kube-burner
time                       uuid                                    P99    apiserverCPU_cpu_avg    ovnCPU_cpu_avg    etcdCPU_cpu_avg    etcdDisck_duration_avg
-------------------------  ------------------------------------  -----  ----------------------  ----------------  -----------------  ------------------------
2024-01-10 14:18:49 +0000  91a7a520-ca19-43b9-9d5f-dca8f3df5518  13000                 28.8317           8.03138            15.8919                 0.0131896
                                                                        ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท                    ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท                            
                                                                                         +8.2%                               +10.5%                            
                                                                        ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท                    ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท                            
2024-02-06 11:05:29 +0000  ff4e1c2c-6960-4081-bc01-5df2c1e72541  13000                 31.0332           8.22205            17.8231                 0.013406
2024-02-06 12:45:08 +0000  20d0c41a-be34-49b4-b5e0-e127bc49aa0d  13000                 31.7655           7.09412            17.3852                 0.0147301
2024-02-06 23:39:44 +0000  e88b185a-34eb-4447-be11-54829bec9d39  13000                 30.832            7.94952            17.5129                 0.0142578
2024-02-07 20:23:43 +0000  7873914e-24c1-4657-8317-0f778df1ec14  13000                 31.6765           6.91436            17.641                  0.0139926
2024-02-08 12:19:27 +0000  c525cda2-8712-4ad4-93f1-ec4a5169ce0d  13000                 31.3236           7.01006            17.8414                 0.0136317
2024-02-09 14:42:53 +0000  f4fcc9eb-c6ce-4ac3-9979-63c0c5466302  13000                 30.7888           6.96967            17.1353                 0.0135467
2024-02-09 14:48:52 +0000  8f2f5271-5dff-44d4-b126-1f856e6a8387  13000                 30.7914           8.18303            17.3592                 0.0132814
2024-02-12 01:22:23 +0000  0fd65106-dd94-4dd7-9ee3-ac062db2909e  13000                 31.42             7.01348            17.8418                 0.0135024
jtaleric commented 3 months ago

I chatted with @shashank-boyapally on Slack. Here is the current thoughts -

vishnuchalla commented 3 months ago

I chatted with @shashank-boyapally on Slack. Here is the current thoughts -

  • Initial version of Orion w/ Daemon mode will be opinionated. The user will only provide the version which they want to determine if there is a regression. We will use the openshift-payload job to run Hunter against. The payload jobs have the most data and seem like a good starting point.
  • Follow on version os Orion w/ Daemon mode could consider implementing a way to accept multiple configs, and the user can choose which config they want to determine if there was change detected. One idea is to have a listTests api endpoint to determine which tests are loaded then the user could provide a test name to run Hunter/Algo against, like name=aws-cdv2-fips-120node or whatever the tests are defined as in the configuration file.

+1 on these ideas. To add on top of it just so that we don't loose track, Hunter also has this feature of having tests divided into groups and then compare between them for regressions. That would also be a good use case to add later when we get to that point.

paigerube14 commented 3 months ago

@jtaleric @shashank-boyapally these changes sound good to me, one question is is there a way to set a timeframe that if we have a job from say January that shows a regression, should we continue to report that issue with every run or should we set a time period (last 2 weeks) or number of runs back (last 10 runs) to limit re-reporting regressions?

vishnuchalla commented 3 months ago

@jtaleric @shashank-boyapally these changes sound good to me, one question is is there a way to set a timeframe that if we have a job from say January that shows a regression, should we continue to report that issue with every run or should we set a time period (last 2 weeks) or number of runs back (last 10 runs) to limit re-reporting regressions?

Hunter also has a parameter where we can specify the timestamp field to look at data regressions since a starting point. Ahh, I forgot to mention earlier, it would be a good addition too.

shashank-boyapally commented 3 months ago

Hi Paige, my opinion on this is we should have the previous regression showing in both cmd mode and daemon, the service consuming the api should be able to filter it out based on the timestamp if needed. My take on having the whole regressions based upon the timeline each version of openshift is tested ~ 6 months. One thing we can do is hunter has a timestamp filter which can ignore previous runs before that if we want to have that functionality.

paigerube14 commented 3 months ago

I'm thinking in terms of orion running in a CI where we might only want to know if the latest/current run is showing a regression from previous runs and fail the job if a regression is detected. I think the timestamping from hunter would be helpful with that as both @vishnuchalla and @shashank-boyapally mentioned. Still a ways out from getting this into a CI though. Just a thought. I'll open an issue to track this option

shashank-boyapally commented 3 months ago

I added verify certs as an argument so that it can support the new es instances, the PR can be merged once we have fmatch 0.0.7

vishnuchalla commented 3 months ago

As per our discussion offline, please break down this PR into atomic commits. Thanks