[X] I have searched the existing issues and none matched.
Operating System
Windows 10
CPU
Intel
GPU
gefore 3080 ti
Storage
2TB SSD
Game Version
4.19.1
Game Mode
dance
Theme
default
Preabmle
As far as I can tell, OnlineHash is calculated by Steps::GetHash(). This function gets the ASCII string representation of the SM note data, and then calculates its crc32 checksums. The result is then assigned to OnlineHash.
I assume that the purpose of OnlineHash is to identify songs and charts so that score submissions are assigned to the correct versions of charts. Therefore, it is a requirement that two identical charts return the same OnlineHash value
Issue
It seems that m_sNoteDataCompressed can be populated in other ways than NoteDataUtil::GetSMNoteDataString. The result is that some different arrangements of whitespace characters occur in otherwise equivalents step patterns. For example a particular line could have either 0000\r\n or 0000\n. I've seen trailing whitespace show itself on some random lines also.
Because the OnlineHash is just the crc32 of this note data string, we're going to end up getting different hashes for equivalent charts.
Assuming this were fixed, there are other problems. OnlineHash uses only the step data, not the timing data.
Proposed Solution
Change all \r\n and \r to \n and strip all whitespace and comments for the note data string that gets hashed.
If we care a lot about reducing as many collisions as possible, then OnlineHash should also be updated to digest the timing data. In my opinion, the one exception would be Beat0Offset, since it does not degrade the fundamental nature of a chart.
Another minor concern is that crc32 hashes are not secure. However, there will never truly be a way to stop fake scores from being submitted, so this may be a moot point. But a high-bit hash algorithm will further reduce the chances of collisions. And it doesn't hurt to use a secure hash algorithm. Git commits weren't originally designed to be securely hashed, but it proved to be very useful. An ounce of prevention now could be worth a pound of cure in the future.
Is there an existing issue for this?
Operating System
Windows 10
CPU
Intel
GPU
gefore 3080 ti
Storage
2TB SSD
Game Version
4.19.1
Game Mode
dance
Theme
default
Preabmle
As far as I can tell, OnlineHash is calculated by
Steps::GetHash()
. This function gets the ASCII string representation of the SM note data, and then calculates its crc32 checksums. The result is then assigned to OnlineHash.I assume that the purpose of OnlineHash is to identify songs and charts so that score submissions are assigned to the correct versions of charts. Therefore, it is a requirement that two identical charts return the same OnlineHash value
Issue
It seems that
m_sNoteDataCompressed
can be populated in other ways thanNoteDataUtil::GetSMNoteDataString
. The result is that some different arrangements of whitespace characters occur in otherwise equivalents step patterns. For example a particular line could have either0000\r\n
or0000\n
. I've seen trailing whitespace show itself on some random lines also.Because the OnlineHash is just the crc32 of this note data string, we're going to end up getting different hashes for equivalent charts.
Assuming this were fixed, there are other problems. OnlineHash uses only the step data, not the timing data.
Proposed Solution
Change all
\r\n
and\r
to\n
and strip all whitespace and comments for the note data string that gets hashed.If we care a lot about reducing as many collisions as possible, then OnlineHash should also be updated to digest the timing data. In my opinion, the one exception would be Beat0Offset, since it does not degrade the fundamental nature of a chart.
Another minor concern is that crc32 hashes are not secure. However, there will never truly be a way to stop fake scores from being submitted, so this may be a moot point. But a high-bit hash algorithm will further reduce the chances of collisions. And it doesn't hurt to use a secure hash algorithm. Git commits weren't originally designed to be securely hashed, but it proved to be very useful. An ounce of prevention now could be worth a pound of cure in the future.