Nytilde commented 6 months ago

Why clipping is not as bad as it looks, and how we can use the features to make recommendations for better map-design.

How the current predictor works

These features are used in the new difficulty predictor: ["AverageTimeDifference", "d1min", "HighNoteDensity2s", "NoteDensity", "cdw"]

To analyze the behavior of a regular map, one must initially ascertain its rate of change. This is accomplished by differentiating a time function with respect to time. However, in practical applications, we utilize a time vector to compute time differences, enabling the determination of the iterator's change with a lag of one:

Figure 1: Rate of Change

The presence of high peaks in the data indicates longer breaks, while the lower sections indicate a higher density of notes in the area. Ultimately, we use the average rate of change, which heuristically provides the best general separation performance according to the Exploratory Data Analysis (EDA). This separation performance in the level 1 range could be further improved with the 40% quantile.

HighNoteDensity2s corresponds approximately to the rolling density and has a high classification performance in the upper level range but decreases in the lower level range.

Figure 2: HighNoteDensity2s

NoteDensity offers a generally uniform but not so pronounced differentiation of all level ranges.

Figure 3: NoteDensity

cdw follows the approach of counting all notes in a constant window range in the higher velocity range. This feature contributes only minimally to the improvement but could slightly raise the classification in the middle range.

Figure 4: cdw

The next step is to introduce the influence of the pattern (work in progress).

My Considerations

What do models that are filtered via clipping tell us? In principle, we need to look in detail at which features do not match, and we can determine from the resulting deviations.

According to my observations, most outliers in the lower and middle levels often indicate little variety in the beatmap. Recommendation: include more breaks to make it more varied.

I am always open to further suggestions and constructive criticism of my ideas.

Nytilde commented 6 months ago

Here is the link to the Live Playground: https://colab.research.google.com/drive/1hj86xf_RDBks0uO6CDi4olxMlwilHVQC?usp=sharing

Nytilde commented 6 months ago

A further idea could be a fall-back model, which, while having lower accuracy, is designed specifically for extrapolations. To explore this, I have conducted a test where I reduced the dimensions of the PCA to 1 and 2:

Figure 1: Current Model

Figure 2: 1-dimensional 1dim

Figure 3: 2-dimensional 2dim

My Considerations

The main disadvantage of this method is that during extrapolation, it mainly considers the influence of the signal size and not the combination of factors. This can result in an extrapolated mean that does not accurately reflect the original data’s distribution, especially in time series data.

For example, if the first part of the time series (beat map) has a low frequency and the second part has a high frequency, the predicted value might fall somewhere in between, failing to capture the true variations.

Brollyy commented 5 months ago

Idea from Mrnaris on Discord:

Would be cool if there was a way to "check" the difficulty within an interval, for example, if i wanna know how hard the hardest part of a song is i'd select that part and it would give me a difficulty value for that part, same goes for if i wanna see how hard the easiest part of a song is, idk how hard this would be to implement tho.

the interval could possibly be set by two different bookmarks

Brollyy commented 3 months ago

What's next?

There's been a lot of feedback regarding the new difficulty predictor tool, we should consider which way we want to go with it then. @PKBeam @Nytilde

I'm all for allowing users to select whichever model/algorithm they want to use while we work on improving the models, but I have a few points that I think we'll need to look at first:

I'm afraid that the differences between models will be hard to understand, so it'll be confusing and not very helpful to see very different difficulty estimations from different models. We could add descriptions to the options, but that will make the Difficulty Predictor window UI even more busier than it already is...
How would the model selection work with #119 ? From what I understood, the old model is best when used with fully-completed map, so it wouldn't make sense to display real-time difficulty prediction for that option.
If we go with multiple options for the user, do we include the algorithm suggested by Melchior on Discord? From what I understood, Nytilde used it as an additional feature for the ML model and saw some improvement on higher difficulties, but we could also put it as a separate option altogether - here's the results of fitting it to all OST and RAID songs: https://docs.google.com/spreadsheets/d/12osrf0tNd8vZQqO2ahS8AMeypqug8wgg8gZsEJnuSwo/edit?usp=sharing
Should we allow the models to not provide any estimation if the map params are outside the expected range (like with ">10" right now)? I think that adds to the confusion right now - it's probably better to just have a fallback model Nytilde described earlier for this case.
Gathering feedback on the difficulty estimations in-app (e.g. thumbs up/thumbs down) would be very helpful for future improvements, but I think we'd need to host a new server to store it, so probably too much of a hassle...

In the meantime, I'll work at least on implementing the new version of Nytilde's model + the fallback model that she sent me 2 months back to see if there are any improvements.

Brollyy commented 2 months ago

The changes done in #129 are a good step forward, but I'm not closing this issue, as there are still unresolved problems with the new ML model that need addressing in the future releases.

Nytilde commented 2 months ago

The unexpected behavior might be due to multicollinearity, where some features are highly correlated with each other. To address this, we should consider techniques to handle multicollinearity. While Multiple Linear Regression (MLR) can model the target variable as a linear function of the features, it does not inherently resolve multicollinearity. Instead, we may need to use methods like regularization (e.g., Ridge or Lasso regression) to stabilize the model and reduce the impact of multicollinearity. Alternatively, Principal Component Analysis (PCA) can transform the features into uncorrelated components, though this might make the model's interpretability more challenging.

I don't have that much time those days to do that. Perhaps it is better if we train a simpler model for now, which uses a maximum of 3 dimensions.

PKBeam / Edda

Improvement of the difficulty predictor #120

Why clipping is not as bad as it looks, and how we can use the features to make recommendations for better map-design.

How the current predictor works

My Considerations

My Considerations

What's next?