Closed jbmelander closed 1 year ago
Similarly, increasing training_duration_sec
results in less clusters. I am wondering if I should be increasing components alongisde number of snippets for training.
The relationship between the
phase1_detect_threshold
and the number of found clusters is a bit surprising to me. For example, if I use adetect_sign
of 0 and a threshold of 1.5, >20000 snippets are found, but this results in only 3 clusters. If I use a threshold of 8.5, ~5000 snippets are found, resulting in ~30 clusters. I would have expected that including more snippets resulted in more clusters. Do you have any advice for optimizing this parameter?
This is tricky. If the threshold is too low, then what I often find is that clusters get merged together because they can be merged with large noise clusters. This would explain the fewer number of clusters with lower detect threshold.
I don't have any guidance for you on this, since I think the optimal choice will depend very much on the type of dataset.
OK. Thanks. I just wanted to confirm that this was expected behavior.
The relationship between the
phase1_detect_threshold
and the number of found clusters is a bit surprising to me. For example, if I use adetect_sign
of 0 and a threshold of 1.5, >20000 snippets are found, but this results in only 3 clusters. If I use a threshold of 8.5, ~5000 snippets are found, resulting in ~30 clusters. I would have expected that including more snippets resulted in more clusters. Do you have any advice for optimizing this parameter?