apache / incubator-heron

Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter
https://heron.apache.org/
Apache License 2.0
3.65k stars 598 forks source link

Fix HealthManager #3819

Closed thinker0 closed 2 years ago

thinker0 commented 2 years ago
nicknezis commented 2 years ago

What was the error encountered that required the change from float to string in the timeline object? I ask because there were various issues with the intermingling of the types. I was hoping we could keep it consistent, but recognize there might be use cases I didn't account for with the prior testing. So I'm interested to better understand what this PR fixes, and also does the timeline in the UI still work? If so, then I think I'm ok with this PR.

surahman commented 2 years ago

I ask because there were various issues with the intermingling of the types. I was hoping we could keep it consistent, but recognize there might be use cases I didn't account for with the prior testing.

I approved because Python typing will, rather annoyingly, allow you to achieve what follows. I agree with the need for consistency.

float_Val: float = 9.19
print(float_Val, type(float_Val))

str_Val: str = float_Val
print(str_Val, type(str_Val))

str_Val = 919
print(str_Val, type(str_Val))

"""
9.19 <class 'float'>
9.19 <class 'float'>
919 <class 'int'>
"""
thinker0 commented 2 years ago

https://github.com/apache/incubator-heron/blob/700125f271b854a5ad2e3fe3204c1e2169ce069a/heron/tools/tracker/tests/python/query_operator_unittest.py#L41-L57

In the internal UnitTest, it seems to be a String.

nicknezis commented 2 years ago

Should the unit test be updated to be a float instead of string? I did not catch that.

When I started working on the last PR, it was originally an int which I think was incorrect. str might be valid option. I think I tried to make it work, but we had enough other issues in the PR. I trust your opinion on str vs float.

We should make sure we are setting the type consistently. Should this line of code also be updated to be a str?

Edit: Also, does this other line, which was unchanged in the prior PR, help the choice to use str? It seems the code was converting to float which wouldn't make sense with my use of float.

thinker0 commented 2 years ago

humm. Conflit is occurring in several places ㅠ.ㅠ

thinker0 commented 2 years ago

I fixed it on my CentOS 7.

thinker0 commented 2 years ago
%  ./docker/scripts/test-unittest.sh centos7 0.20.5
================================================================================
INFO: Elapsed time: 1889.214s, Critical Path: 207.25s
INFO: 3826 processes: 1175 internal, 2651 local.
INFO: Build completed successfully, 3826 total actions
Test cases: finished with 1189 passing and 0 failing out of 1189 test cases

Executed 247 out of 247 tests: 247 tests pass.
INFO: Build completed successfully, 3826 total actions
Cleaning up scratch dir
thinker0 commented 2 years ago
% ./docker/scripts/test-unittest.sh rocky8 0.20.5
================================================================================
INFO: Elapsed time: 1835.725s, Critical Path: 246.64s
INFO: 3826 processes: 1175 internal, 2651 local.
INFO: Build completed successfully, 3826 total actions
Test cases: finished with 1189 passing and 0 failing out of 1189 test cases

Executed 247 out of 247 tests: 247 tests pass.
INFO: Build completed successfully, 3826 total actions
Cleaning up scratch dir
thinker0 commented 2 years ago

Should the unit test be updated to be a float instead of string? I did not catch that.

Nothing

We should make sure we are setting the type consistently. Should this line of code also be updated to be a str?

I don't think you need to change this part. When 0.20.5 centos binary was applied to both tracker and ui in production, it worked in the UI regardless of this part.

surahman commented 2 years ago

There was a failing test for killing topologies but I have seen that fail intermittently before. I did restart the CI pipeline.

Is the downgrade of the dev-toolset related to an unsupported library causing issues?

I do not see any issues with using the C++1y flag instead of C++14 because the other OSs have their version of GCC and G++ kept up to date. This means the flag will indicate compiling against the final C++14 standard on those OSs. It will, however, compile using the draft standard on Centos7 (GCC 4.8.x) but given the build completes successfully I believe this should be okay.

I will request @nicknezis to have a look at the tracker change because he has spent a lot of time in it recently.

thinker0 commented 2 years ago

Fixed some functions.

thinker0 commented 2 years ago

There was a failing test for killing topologies but I have seen that fail intermittently before. I did restart the CI pipeline.

Is the downgrade of the dev-toolset related to an unsupported library causing issues?

I do not see any issues with using the C++1y flag instead of C++14 because the other OSs have their version of GCC and G++ kept up to date. This means the flag will indicate compiling against the final C++14 standard on those OSs. It will, however, compile using the draft standard on Centos7 (GCC 4.8.x) but given the build completes successfully I believe this should be okay.

I will request @nicknezis to have a look at the tracker change because he has spent a lot of time in it recently.

This was judged to be a cause other than the cause of gcc, so it was restored to its original state.