fsktom / rusty-endsong-parser

Better, more performant version of https://github.com/fsktom/endsong-parser-python written in Rust
0 stars 0 forks source link

When a song appears in many albums with different capitalization #2

Closed fsktom closed 2 weeks ago

fsktom commented 2 years ago
>>> psons
  >> Sabaton
  >> The Final Solution

I've found 2 song named The Final Solution from Sabaton!
Sabaton - The Final Solution (Coat Of Arms) | 96 plays
Sabaton - The Final Solution (Coat of Arms) | 15 plays

>>> palb
  >> Sabaton
  >> Coat of Arms

=== Sabaton - Coat Of Arms | 929 plays ===
 #1: Sabaton - Uprising (Coat Of Arms) | 125 plays
 #2: Sabaton - White Death (Coat Of Arms) | 115 plays
 #3: Sabaton - Saboteurs (Coat Of Arms) | 99 plays
 #4: Sabaton - The Final Solution (Coat Of Arms) | 96 plays
 #5: Sabaton - Aces In Exile (Coat Of Arms) | 91 plays
 #6: Sabaton - Midway (Coat Of Arms) | 88 plays
 #7: Sabaton - Coat of Arms (Coat Of Arms) | 81 plays
 #8: Sabaton - Wehrmacht (Coat Of Arms) | 74 plays
 #9: Sabaton - Metal Ripper (Coat Of Arms) | 71 plays
#10: Sabaton - Screaming Eagles (Coat Of Arms) | 70 plays
#11: Sabaton - Coat Of Arms (Coat Of Arms) | 18 plays
#12: Sabaton - Coat Of Arms - Instrumental (Coat Of Arms) | 1 plays

>>> palb
  >> Sabaton
  >> Coat Of Arms

=== Sabaton - Coat Of Arms | 929 plays ===
 #1: Sabaton - Uprising (Coat Of Arms) | 125 plays
 #2: Sabaton - White Death (Coat Of Arms) | 115 plays
 #3: Sabaton - Saboteurs (Coat Of Arms) | 99 plays
 #4: Sabaton - The Final Solution (Coat Of Arms) | 96 plays
 #5: Sabaton - Aces In Exile (Coat Of Arms) | 91 plays
 #6: Sabaton - Midway (Coat Of Arms) | 88 plays
 #7: Sabaton - Coat of Arms (Coat Of Arms) | 81 plays
 #8: Sabaton - Wehrmacht (Coat Of Arms) | 74 plays
 #9: Sabaton - Metal Ripper (Coat Of Arms) | 71 plays
#10: Sabaton - Screaming Eagles (Coat Of Arms) | 70 plays
#11: Sabaton - Coat Of Arms (Coat Of Arms) | 18 plays
#12: Sabaton - Coat Of Arms - Instrumental (Coat Of Arms) | 1 plays

so I think this happens bc I do .to_lowercase() on user input and entries found....

I gotta find a solution for that (either combining the albums with different capitalization) or taking raw user input

fsktom commented 1 year ago
>>> palb
Artist name?
  >> sabaton
Album name?
  >> coat of arms
=== Sabaton - Coat Of Arms | 140 plays ===
 #1: Sabaton - White Death (Coat Of Arms) | 23 plays
 #2: Sabaton - Uprising (Coat Of Arms) | 18 plays
 #3: Sabaton - The Final Solution (Coat Of Arms) | 15 plays
 #4: Sabaton - Aces In Exile (Coat Of Arms) | 15 plays
 #5: Sabaton - Saboteurs (Coat Of Arms) | 13 plays
 #6: Sabaton - Wehrmacht (Coat Of Arms) | 12 plays
 #7: Sabaton - Midway (Coat Of Arms) | 12 plays
 #8: Sabaton - Coat of Arms (Coat Of Arms) | 10 plays
 #9: Sabaton - Metal Ripper (Coat Of Arms) | 10 plays
#10: Sabaton - Screaming Eagles (Coat Of Arms) | 10 plays
#11: Sabaton - Coat Of Arms (Coat Of Arms) | 2 plays
>>> psons
Artist name?
  >> sabaton
Song name?
  >> saboteurs
I've found 2 songs named Saboteurs from Sabaton!
Sabaton - Saboteurs (Coat of Arms) | 20 plays
Sabaton - Saboteurs (Coat Of Arms) | 13 plays
fsktom commented 1 year ago

Possible solution: combine those albums (and maybe songs too if they have different capitalization?)

fsktom commented 1 year ago

Similar to this: to get artists with the same name but different capitalization

fn unique_capitalization(entries: &SongEntries) {
    let mut unique_artists = std::collections::HashMap::<String, String>::new();

    for artist in entries.artists() {
        match unique_artists.get(&artist.to_lowercase()) {
            Some(art) => {
                if art != &artist {
                    println!("{artist} : {art}");
                    let a = types::Artist::new(&artist);
                    let t = types::Artist::new(art);
                    entries.print_aspect(&AspectFull::Artist(&a));
                    entries.print_aspect(&AspectFull::Artist(&t));
                    println!();
                }
            }
            None => {
                unique_artists.insert(artist.to_lowercase(), artist.clone());
            }
        }
    }
}

for my dataset (2023-03/04), this returned

Domino : DOMINO
=== Domino | 1 plays ===
--- Domino - Filmowa Miłość | 1 plays ---
#1: Domino - Filmowa Miłość (Filmowa Miłość) | 1 plays
=== DOMINO | 2 plays ===
--- DOMINO - U can do it ! | 2 plays ---
#1: DOMINO - 約束 (U can do it !) | 1 plays
#2: DOMINO - U can do it ! (U can do it !) | 1 plays

Dreamcatcher : DREAMCATCHER
=== Dreamcatcher | 7 plays ===
--- Dreamcatcher - [Dystopia : Lose Myself] | 2 plays ---
#1: Dreamcatcher - BOCA ([Dystopia : Lose Myself]) | 1 plays
#2: Dreamcatcher - Break The Wall ([Dystopia : Lose Myself]) | 1 plays
--- Dreamcatcher - [Apocalypse : Follow us] | 2 plays ---
#1: Dreamcatcher - Fairytale ([Apocalypse : Follow us]) | 2 plays
--- Dreamcatcher - [Apocalypse : Save us] | 1 plays ---
#1: Dreamcatcher - MAISON ([Apocalypse : Save us]) | 1 plays
--- Dreamcatcher - 악몽·Fall asleep in the mirror | 1 plays ---
#1: Dreamcatcher - GOOD NIGHT (악몽·Fall asleep in the mirror) | 1 plays
--- Dreamcatcher - Eclipse | 1 plays ---
#1: Dreamcatcher - Don't Light My Fire (Eclipse) | 1 plays
=== DREAMCATCHER | 1 plays ===
--- DREAMCATCHER - 악몽 | 1 plays ---
#1: DREAMCATCHER - Chase Me (악몽) | 1 plays

Tia : TiA
=== Tia | 19 plays ===
--- Tia - Deal with the devil | 16 plays ---
#1: Tia - Deal with the devil (Deal with the devil) | 16 plays
--- Tia - The Glory Days | 2 plays ---
#1: Tia - The Glory Days (The Glory Days) | 2 plays
--- Tia - ハートリアライズ | 1 plays ---
#1: Tia - ハートリアライズ (ハートリアライズ) | 1 plays
=== TiA | 2 plays ===
--- TiA - 流星 | 2 plays ---
#1: TiA - 流星 (流星) | 2 plays
fsktom commented 1 year ago

^they all seem to be completely different artists... so either find_artist is only gonna find an artist with the same capitalization... or it's gonna return a vector of Artists and in UI the user will have to choose which one...

https://github.com/fsktom/rusty-endsong-parser/blob/921519de4eb144a5a24da1239879673e7372dcf8/src/display.rs#L434-L444

fsktom commented 1 year ago

if the find() functions are only gonna find the aspect with the same capitalization, maybe I could make use of #30 hmm

trade-offs, trade-offs.... bc dealing with a vector of aspects is gonna be pain

fsktom commented 1 year ago

Maybe just have a find_artist function that requires correct capitalization and returns a single Artist, and a find_artists function which ignores capitalization and returns a vector of artists

The one used by the UI will he the one requiring correct capitalization, see #30

fsktom commented 3 weeks ago

trade-offs, trade-offs....

also, worse performance if don't stop at first correct match but actually have to go through each entry, filter uniques, and return a vector of possibilities

find v1                 time:   [3.0087 ms 3.0149 ms 3.0220 ms]
Found 20 outliers among 100 measurements (20.00%)
  1 (1.00%) low severe
  6 (6.00%) high mild
  13 (13.00%) high severe

find v2                 time:   [6.7890 ms 6.8080 ms 6.8350 ms]
Found 18 outliers among 100 measurements (18.00%)
  4 (4.00%) low severe
  5 (5.00%) low mild
  3 (3.00%) high mild
  6 (6.00%) high severe

for Mukanjyo... so I can assume ~2.5x worse performance, but... 3 or 7 ms won't be noticeable, right? so it should be fine

now the other trade off is worse ergonomics but yeah...

fsktom commented 2 weeks ago

Even worse for find_artist with Survive Said The Prophet

find artist v1          time:   [2.1447 ms 2.1849 ms 2.2614 ms]
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) high mild
  7 (7.00%) high severe

find artist v2          time:   [11.194 ms 11.356 ms 11.619 ms]
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) low mild
  3 (3.00%) high mild
  5 (5.00%) high severe

ugh

but should be fine, relative difference is big but absolute difference is 9ms, so shouldn't be noticeable, right?

(simplified) find v1:

entries
    .iter()
    .find(|entry| art.is_entry_lowercase(entry))
    .map(Artist::from)

find v2:

entries
    .iter()
    .filter(|entry| art.is_entry_lowercase(entry))
    .map(Artist::from)
    .unique()
    .collect_vec()

also same for song but i did benchmark code for v2 wrong kek

fsktom commented 2 weeks ago

https://github.com/fsktom/rusty-endsong-parser/issues/2#issuecomment-1501170202

done the same for songs because I wasn't sure if it was necessary to adjust find::song as well, because how probable that same artist, album capitalization but song capitalization would change...

quite probable in fact! (my first idea was to go over the list for each entry... n^2 but thankfully I remembered HashMap which I used earlier to artist too xd)

let mut unique_songs = std::collections::HashMap::new();

entries
    .iter()
    .map(|e| {
        Song::new(
            std::rc::Rc::from(e.track.to_lowercase()),
            std::rc::Rc::clone(&e.album),
            std::rc::Rc::clone(&e.artist),
        )
    })
    .unique()
    .for_each(|s| {
        unique_songs.insert(s, vec![]);
    });
for e in &entries[..] {
    let song = Song::new(
        std::rc::Rc::from(e.track.to_lowercase()),
        std::rc::Rc::clone(&e.album),
        std::rc::Rc::clone(&e.artist),
    );
    let name = std::rc::Rc::clone(&e.track);
    let names = unique_songs.get_mut(&song).unwrap();
    if !names.contains(&name) {
        names.push(name);
    }
}
let mut n = 0;
unique_songs
    .iter()
    .filter(|(_, v)| v.len() != 1)
    .for_each(|v| {
        dbg!(v);
        n += 1;
    });
dbg!(n);

with 88 (heh) entries for June/July? 2023 dataset with sum_different_capitalization turned off obviously

like

[src/main.rs:86:13] v = (
    Song {
        name: "mirror of souls",
        album: Album {
            name: "Mirror of Souls",
            artist: Artist {
                name: "Theocracy",
            },
        },
    },
    [
        "Mirror Of Souls",
        "Mirror of Souls",
    ],
)
[src/main.rs:86:13] v = (
    Song {
        name: "du du",
        album: Album {
            name: "Lady Pank",
            artist: Artist {
                name: "Lady Pank",
            },
        },
    },
    [
        "Du Du",
        "Du du",
    ],
)
fsktom commented 2 weeks ago

I think... after over 2 years I can finally close this issue...

🦊 :/