AtheMathmo / rusty-machine

Machine Learning library for Rust
https://crates.io/crates/rusty-machine/
MIT License
1.25k stars 153 forks source link

DBSCAN clusters of size < min_points being returned #196

Open NatPRoach opened 5 years ago

NatPRoach commented 5 years ago

Hello, I've been using your implementation of DBSCAN, and noticed that its been outputting clusters smaller than the minimum size I specified at initialization. The relevant section of code I've been using to look at clusters is below:

    let mut db = DBSCAN::new(1.5, 5);
    db.train(&similarity_matrix).unwrap();
    let cluster_assignments = db.clusters().unwrap();
    let mut clusters = Vec::<Vec::<usize>>::new();
    for (i,assignment) in cluster_assignments.iter().enumerate() {
        if assignment.is_some(){
            let val = assignment.unwrap();
            println!("read {} {}: cluster {}",i, read_ids[i], val );
            if clusters.len() == val{
                clusters.push(vec![i])
            }
            else{
                clusters[val].push(i)
            }
        }
        else{
            println!("read {} {}: cluster {}", i, read_ids[i], -1 );
        }
    }
    for (i,cluster) in clusters.iter().enumerate(){
        println!("Cluster {}, size {}:", i, cluster.len());
        for index in cluster.iter(){
            println!(">{}",read_ids[*index]);
            let bytes = seqs[*index].clone();
            println!("{}",String::from_utf8(bytes).unwrap());
        }
    }

Using this code I've been getting clusters of sizes < 5, as small as 1 or 2 elements in some cases.