Open lukeyan2023 opened 1 year ago
Thank you for the detailed feedback. We will work quickly to identify and fix this bug. 💪🔧
Thank you for the detailed feedback. We will work quickly to identify and fix this bug. 💪🔧
The above two screenshots are from the Flink 1.17.1 official document
In actual testing, the SP address in Figure 2 should use the format 2 in Figure 1 to function properly, but using the format 1 in Figure 1 will fail, which is also the reason for the current error.
By reviewing the source code, I understand that the process should be as follows:
First, trigger the generation of sp, then streampark will obtain the generated sp from Flink and write it to the database. Second, when restoring from sp, it will obtain the sp parameters from the database.
So if we fix this issue, should we obtain the format 2 in Figure 1 when obtaining sp from Flink, so that the code in other places doesn't need to be changed?
Thank you for providing the information. We encourage you to fix this bug, how about it? 💪 We need to test and determine the savepoint path rules under different versions of Flink (1.12 ~ 1.17). We warmly welcome you to fix this bug. And we believe you can do it! 👍😊
Thank you for providing the information. We encourage you to fix this bug, how about it? 💪 We need to test and determine the savepoint path rules under different versions of Flink (1.12 ~ 1.17). We warmly welcome you to fix this bug. And we believe you can do it! 👍😊
I am willing to fix this bug and am currently reading the relevant code, but due to my limited abilities, it may take some time
By reviewing the relevant source code and Flink official documents, I believe that the correct savepoint format should be Format 1 in the screenshot
So I think this should be a problem with Flink, not Streampark。To prove this, I ran the following test
Flink Version 1.17.1
In summary, it should be that flink sp has unexpected behavior when using S3 storage and using the s3p protocol
If this is the design goal of flink, then maybe streampark needs to be optimized specifically for this scenario. If this does not meet the design goals of flink, it seems that the BUG should be reported to the flink community
@wolfboys
By reviewing the relevant source code and Flink official documents, I believe that the correct savepoint format should be Format 1 in the screenshot
So I think this should be a problem with Flink, not Streampark。To prove this, I ran the following test
Flink Version 1.17.1
- Use HDFS to store the savepoint, and use the sp recovery task in format 1 as shown in the screenshot. Test result successful
- Use S3 to store the savepoint and select the s3a protocol, and use the sp restore task of format 1 in the screenshot. Test result successful
- Use S3 to store the savepoint and select the s3p protocol, and use the sp restore task of format 1 in the screenshot. test result failed
- Use S3 to store the savepoint and select the s3p protocol, and use the sp recovery task in format 2 in the screenshot. Test result successful
In summary, it should be that flink sp has unexpected behavior when using S3 storage and using the s3p protocol
If this is the design goal of flink, then maybe streampark needs to be optimized specifically for this scenario. If this does not meet the design goals of flink, it seems that the BUG should be reported to the flink community
@wolfboys
Sorry for taking so long to get back here, based on your description, there is a preliminary suspicion that it might be a bug in Flink. We need further confirmation. If it is true, we can provide feedback to the Flink community.
Search before asking
Java Version
1.8
Scala Version
2.12.x
StreamPark Version
2.1.1
Flink Version
1.17.1
deploy mode
kubernetes-application
What happened
Flink failed to recover from savepoints that automatically saved by streampark, Through reviewing the logs, it was found that the value of the savepoint submitted during streampark's recovery of the flash is
s3p://lakehouse/flink/sp/Platform-Link-Test-Security Log/savepoint-2b3ed0-f0c7ba51791f
. By checking the logs of the Flink app, it was found that Flink encountered an error when restoring from savepoints3p://lakehouse/flink/sp/Platform-Link-Test-Security-Log/savepoint-2b3ed0-f0c7ba51791f
. Afterwards, manually submitting using the same savepoints3p://lakehouse/flink/sp/Platform-Link-Test-Security Log/savepoint-2b3ed0-f0c7ba51791f
through common cli encountered the same errorHowever, by modifying the savepoint format to
s3p://lakehouse/flink/sp/Platform-Flink-Test-Security-Log/savepoint-2b3ed0-f0c7ba51791f/_metadata
, both common cli and streampark submissions can be successful.Error Exception
Screenshots
Are you willing to submit PR?
Code of Conduct