LMS-Community / slimserver

Server for Squeezebox and compatible players. This server is also called Lyrion Music Server.
https://lyrion.org
Other
1.16k stars 293 forks source link

Duplicate title entries after library update #547

Closed schnillerman closed 2 years ago

schnillerman commented 3 years ago

Whenever I update files that are registered in the library, a few of them are registered as duplicates (in the same album) in the library after a re-scan. The nature of the file update can be just (mp3-) tag updates, but also file renaming (directory name remains the same).

The only solution I have found to this is to rename or move the album's directory, re-scan, rename/move it back to its original state and do a second rescan.

Version: 8.2.0 - 1614990095 @ Sat Mar 6 01:43:25 CET 2021

This happens in earlier 8.x and 7.x versions as well.

mherger commented 3 years ago

A full wipe cache & scan would do without moving files, wouldn't it?

schnillerman commented 3 years ago

I just tried this - now all my favorites are gone. :(

And yes, it fixes duplicate entries, but it usually takes longer (2,5h) than 2x rescan (7 minutes per re-scan for 215.000 titles): Just the database deletion takes as long as 1 rescan.

And it would seem weird to me if multiple library entries exist (within the same library, of course) that refer to one and the same file. Maybe a library consistency check that takes care of duplicate entries for same file would be helpful.

bobbydriver commented 2 years ago

I would love to see this one fixed, it's been a long standing problem I noticed too. I use the same workaround (renaming the directory) - and a full rescan is not really a practical option for those of us with large libraries.

Feels like there ought to be an easy solution in the scanner - as @schnillerman suggests, a consistency check or some such

michaelherger commented 2 years ago

Could one of you please outline how that easy consistency check would work?

schnillerman commented 2 years ago

Hi Michael,

totally understand your question :)

Me as a very inexperienced programmer (if any), I would probably check for duplicate entries in the table where the full path/filename are stored.

If one and the same file is listed more than once, there's a good indicator that it's registered as a duplicate.

Cheers, Till

bobbydriver commented 2 years ago

What seems to be happening is - when you change a file within an album somehow, or add a new file to the album - then do a new/changed scan:

The scan finds the new file and creates it within a new album, so you end up with two duplicated albums

1 - the original album with the unchanged files but not the changed/new 2 - the new album with just the changed/new file and none of the unchanged files

So the logic needs to be something like

  1. new file is found
  2. read album tag
  3. read the folder path
  4. does an album with the same name exist with the same folder path?
  5. if yes then add the file to the existing album
  6. if no - carry on as before and create a new album
schnillerman commented 2 years ago

You're right, I remember now that a new album is mostly created in this case.

From: bobbydriver @.> To: Logitech/slimserver @.> CC: schnillerman @.>; Mention @.> Date: 27.01.2022 18:36:05 Subject: Re: [Logitech/slimserver] Duplicate title entries after library update (#547)

What seems to be happening is - when you change a file within an album somehow, or add a new file to the album - then do a new/changed scan:

The scan finds the new file and creates it within a new album, so you end up with two duplicated albums

1 - the original album with the unchanged files but not the changed/new 2 - the new album with just the changed/new file and none of the unchanged files

So the logic needs to be something like

  1. new file is found

  2. read album tag

  3. read the folder path

  4. does an album with the same name exist with the same folder path?

  5. if yes then add the file to the existing album

  6. if no - carry on as before and create a new album

— Reply to this email directly, view it on GitHub[https://github.com/Logitech/slimserver/issues/547#issuecomment-1023476218], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AEDUA6JIT37IURSRI7QC5W3UYF7AJANCNFSM4YZFHB3Q]. Triage notifications on the go with GitHub Mobile for iOS[https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675] or Android[https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub]. You are receiving this because you were mentioned. [###24x24:true###][Verfolgungsbild][https://github.com/notifications/beacon/AEDUA6NOWWWS2ULCWTIQ37DUYF7AJA5CNFSM4YZFHB32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOHUAQD6Q.gif]

michaelherger commented 2 years ago

Now here's the problem: the reason why a regular scan is so much faster than a full wipe & rescan is because the former only deals with changed items, doesn't do these kinds of optimisations and checks. Any additional check will slow it down.

In order to keep things as fast as possible, we have to be sure what we're talking about. The issue subject line says "Duplicate title entries". The description says "duplicates (in the same album)". And the latest suggestion is about duplicated albums. Maybe both are valid. And I'm pretty sure complaints about genres have been heard, too...

I fear in order to fix this all I'd need the amount of time I currently don't have.

mherger commented 2 years ago

Oh, and artists: #704

bobbydriver commented 2 years ago

Thanks Michael - appreciate that it's probably a lot of effort. If I get chance I might set up a test rig and do some proper documentation of the issues/scenarios. I don't know perl so I couldn't do anything with the scanner, but i could at least work out the sql queries that ID the culprits

As for the scan time - I had the exact same thought. It would really need to be a separate scan option for occasional use. A "tidy/remove duplicates scan" or something

In actual fact I'm more than happy with all the LMS functionality these days and the only thing left which bugs me is the way the new/changed scan can make a mess of db integrity. I'd actually really love a UI that allowed me to query and tidy up my music db without the inconvenience of a full drop and rescan, but I know that's dreamland :)

michaelherger commented 2 years ago

Could both of you please describe what tag you'd change (artist, album, title...), and what the outcome would be? I think I've identified one issue if you changed some tracks' artist names without getting rid of the original artist name (eg. different artists of the same name, you rename only one of them). This could likely cause empty albums in the original artist's collection (see #704).

michaelherger commented 2 years ago

Would #705 be a duplicate of this issue?

mherger commented 2 years ago

Those affected by the file renaming issue: what OS are you using?

schnillerman commented 2 years ago

Happens when I change attributes like

If the file name upper/lower case is changed, it happens as well.

I'm running LMS on a Linux Debian.

michaelherger commented 2 years ago

I think I've identified the cause of the duplication in case of a file name case change. See https://github.com/Logitech/slimserver/issues/705#issuecomment-1026229542. There's some background information, and how you might be able to work around / fix this until I have a fix in LMS.

michaelherger commented 2 years ago

Could you please give the 8.3 nightly a try (https://downloads.slimdevices.com/nightly/?ver=8.3)? I applied a few changes to the scanner. I'm no longer able to get invalid records after

schnillerman commented 2 years ago

I just installed 8.3 over 8.2 and will have a look!

Do I need to perform a complete re-scan?

bobbydriver commented 2 years ago

I loaded the nightly and did the same tests - works ok for me too (on Raspbian 10 Buster/Max2Play)

The duplicate albums still get created though if you fundamentally change a filename (other than a case change) - or add new files to the album folder - then run a new/changed rescan. Does that need to be raised as a separate issue to keep things clear?

schnillerman commented 2 years ago

The duplicate albums still get created though if you fundamentally change a filename (other than a case change) - or add new files to the album folder - then run a new/changed rescan. Does that need to be raised as a separate issue to keep things clear?

Thank you so much for mentioning this behavior - I forgot that this happens to me a lot, too, because I've been working around this by temp_renaming the updated folder, scanning, re-naming again, re-scanning!

bobbydriver commented 2 years ago

Just did a test added some new files to an existing album folder Essentially the scan is picking up the new files by timestamp, and creating them as a new album - not recognising that the album already exists and that they should be added to the existing album

I realise that adding this integrity step to a new/changed files rescan will slow things down, but maybe not too much? After all - it only needs to be run against the new files discovered

If you put the cover art in each album folder, then the SQL to id the existing duplicates is quite simple - because although it allocates a new album id to the new files - the value for cover (which is essentially the path to the cover.jpg) is the same for both the new and existing albums

SELECT  distinct album, cover
FROM tracks
WHERE cover IN (
    SELECT cover
    FROM tracks
    GROUP BY cover
    HAVING COUNT(distinct album) > 1
)

Not sure how this works for people who use embedded cover art though

bobbydriver commented 2 years ago

OK - just been digging some more and that SQL is not ideal, as it also finds occurrences where you have files in the same album folder but with different album tags. That's just bad tagging/mistakes, so handy for IDing where your library is messed up, but not a definitive ID of where the new/changed scan problem has happened

I also worked out this SQL query on the albums table

Select A.title,A.id, C.name, b.artwork
from albums A, contributors C
join albums B
on A.title = B.title
and A.contributor=C.id
and A.contributor= B.contributor
and A.year = B.year
and A.artwork <> B.artwork
group by A.title,A.artwork

this IDs where a duplicate album name has the same artist and year BUT a different coverart hash - which also pulls out records where the new/changed scan problem has happened BUT also IDs other issues, like where you have moved a file to a different album but not changed the album tag, or where the album tag within the folder is actually different - so again bad tagging

Neither of these queries take bad tagging into account, so only useful for manually interrogating libraries for bad integrity - not the new/changed scan problem

schnillerman commented 2 years ago

OT - is your nick from Bob's Burgers? 😂

From: bobbydriver @.> To: Logitech/slimserver @.> CC: schnillerman @.>; Mention @.> Date: 01.02.2022 16:39:41 Subject: Re: [Logitech/slimserver] Duplicate title entries after library update (#547)

OK - just been digging some more and that SQL is not ideal, as it also finds occurrences where you have files in the same album folder but with different album tags. That's just bad tagging/mistakes, so handy for IDing where your library is messed up, but not a definitive ID of where the new/changed scan problem has happened

I also worked out this SQL query on the albums table

*Select A.title,A.id, C.name, b.artwork from albums A, contributors C join albums B on A.title = B.title and A.contributor=C.id and A.contributor= B.contributor and A.year = B.year and A.artwork <> B.artwork group by A.title,A.artwork

* this IDs where a duplicate album name has the same artist and year BUT a different coverart hash - which also pulls out records where the new/changed scan problem has happened BUT also IDs other issues, like where you have moved a file to a different album but not changed the album tag - so again bad tagging

Neither of these queries take bad tagging into account, so only useful for manually interrogating libraries for bad integrity - not the new/changed scan problem

— Reply to this email directly, view it on GitHub[https://github.com/Logitech/slimserver/issues/547#issuecomment-1026981912], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AEDUA6J4AOZ6AX72P4A44RDUY75D3ANCNFSM4YZFHB3Q]. Triage notifications on the go with GitHub Mobile for iOS[https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675] or Android[https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub]. You are receiving this because you were mentioned. [###24x24:true###][Verfolgungsbild][https://github.com/notifications/beacon/AEDUA6MPJLFVDVETTKC65Z3UY75D3A5CNFSM4YZFHB32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOHU3IAGA.gif]

bobbydriver commented 2 years ago

haha - no, but I did laugh when I saw that episode

michaelherger commented 2 years ago

The duplicate albums still get created though if you fundamentally change a filename (other than a case change) - or add new files to the album folder - then run a new/changed rescan. Does that need to be raised as a separate issue to keep things clear?

Could you please provide step-by-step instructions what I need to do to reproduce the problem?

schnillerman commented 2 years ago

What I usually did in order to produce the duplicate DB entries (but with 8.3, the behavior seems to be different):

  1. Tag files (with e.g. mp3tag), same album: mistakenly have some of the files with a different year (e.g. 1-5 of 12 with year 2011, 7-12 of 12 with year 2012)
  2. Save them to library (tag to dir/file name; year is included in dir name) -> files 1-5 and 7-12 are in different subdirs
  3. Scan
  4. 2 albums (one for year 2011, one for 2012) are created in LMS DB
  5. Correction of file tags and file location (mp3tag)
  6. Re-scan
  7. 2 albums with same year, artist, album name are shown in LMS, one with files 1-5, one with files 7-12, even though files are now in same subdir

As I mentioned above, this behavior seems to be different with LMS 8.3:

  1. Tag files (with e.g. mp3tag), same album: mistakenly have some of the files with a different year (e.g. 1-5 of 12 with year 2011, 7-12 of 12 with year 2012)
  2. Save them to library (tag to dir/file name; year is included in dir name) -> files 1-5 and 7-12 are in different subdirs
  3. Scan
  4. 1 album is now in DB with year 2012 - verified also by looking for artist: only one album with the same name is present, even though files with year 2011 tagged also show value 2011 in year tag (verified by looking at individual song via "more > further info > show tags")
  5. Correction of file tags, dirs and file names in mp3tag
  6. Re-Scan
  7. Album still shows as one entry with corrected year 2011

It seems in LMS 8.3 now it works as expected.

But what about same albums with different years? (They sometimes exist, e.g. re-releases, and the release info is only present in comment tag)?

bobbydriver commented 2 years ago

Just done the same test as above with v8.3 and confirm the same result. Added a new album with one file having a different date tag It creates one album not two (as it did in 8.2)

So the problem is now just with the album tag

If I add new tracks into an existing album folder - even if the album tag is identical to the existing album tags in the same folder, it still creates a new duplicate album in the db for the new tracks

to test

  1. Take any album that is already in the library
  2. Add a new track or tracks into the folder and tag with the same album tag as the existing tracks
  3. Run a new/change rescan
  4. New duplicate album is created with just the new tracks

The behaviour is sort of understandable, as the existing tracks aren't new or changed, but the folder contents have changed

I don't know how to fix it - maybe the scan needs to look for new/changed subfolders (date modified on the folder) and rescan the whole folder? or when it sees new/changed files it triggers a rescan of the whole subfolder that the new files sit in?

Not sure if either of these are viable

schnillerman commented 2 years ago

Also, with LMS 8.3, if I correct capitalization inside e.g. title tag and therefore, the file name also gets renamed (same name, different capitalization), something strange happens:

The album is not duplicated, but the song in question is, even though it's actually currently playing, not displayed correctly in the player, nor is the playlist of that album: image

michaelherger commented 2 years ago
  1. Take any album that is already in the library
  2. Add a new track or tracks into the folder and tag with the same album tag as the existing tracks
  3. Run a new/change rescan
  4. New duplicate album is created with just the new tracks

This is working as expected here. Are you 100% certain album and artist information are absolutely identical? No upper/lower case issues? No whitespace?

Would you mind sharing the library.db with such an issue with me?

https://www.dropbox.com/request/T3RctyzGgNg0oFDubq6a

schnillerman commented 2 years ago

Can I install 8.2 over 8.3 in order to do that?

From: Michael Herger @.> To: Logitech/slimserver @.> CC: schnillerman @.>; Mention @.> Date: 02.02.2022 21:58:09 Subject: Re: [Logitech/slimserver] Duplicate title entries after library update (#547)

  1. Take any album that is already in the library

  2. Add a new track or tracks into the folder and tag with the same album tag as the existing tracks

  3. Run a new/change rescan

  4. New duplicate album is created with just the new tracks

    This is working as expected here. Are you 100% certain album and artist information are absolutely identical? No upper/lower case issues? No whitespace?

    Would you mind sharing the library.db with such an issue with me?

    https://www.dropbox.com/request/T3RctyzGgNg0oFDubq6a

    — Reply to this email directly, view it on GitHub[https://github.com/Logitech/slimserver/issues/547#issuecomment-1028351609], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AEDUA6IQPZCZXU4HDR6ZET3UZGLGDANCNFSM4YZFHB3Q]. Triage notifications on the go with GitHub Mobile for iOS[https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675] or Android[https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub]. You are receiving this because you were mentioned. [###24x24:true###][Verfolgungsbild][https://github.com/notifications/beacon/AEDUA6O33MQXFPZH24WC6O3UZGLGDA5CNFSM4YZFHB32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOHVFWM6I.gif]

michaelherger commented 2 years ago

Can I install 8.2 over 8.3 in order to do that?

Why would you want to install the previous version? It's fixed in 8.3, not 8.2.

But to answer your question: yes, you can go back and forth as you like.

schnillerman commented 2 years ago

Can I install 8.2 over 8.3 in order to do that?

Why would you want to install the previous version? It's fixed in 8.3, not 8.2.

But to answer your question: yes, you can go back and forth as you like.

Sorry, Michael - the problem of duplicate albums by adding files to a folder or capitalization changes does not seem to be an issue in 8.3 anymore - at least from what I tried. That's why I thought that if you want that particular error, I would need to reproduce it in 8.2 - because that's where it definitely happened. Anyway - I'll try and reproduce the error I described above (https://github.com/Logitech/slimserver/issues/547#issuecomment-1028079064) and share library.db with you via PM. I understand that probably you have responded mainly to bobbydriver's comments, so please excuse my chipping in.

bobbydriver commented 2 years ago

OK - getting closer to the problem now I think.

When you said you couldn't recreate the error by following the steps I described, i was surprised. So I ran through them again and i was even more surprised when I found that you were right - it added the new tracks to the correct (existing album)!

But i was sure that I had seen the issue only yesterday on the same 8,3 nightly, so I went back through the steps and managed to recreate the error in more specific circumstances

In the example, I'm using two Joy Division live shows. Both were partially included in a boxset some years ago, so I had them in my library as two separate albums, one for each partial live show.

Someone then shared remaining tracks which weren't included on the boxset and so I go to add them to each existing folder to complete the albums

In example 1 - I follow the original instructions I gave you. Added the extra files and tagged them to have the same album name as the existing files. The new files are the yellow ones and you can see the first 3 tracks are the old ones - unchanged Capture1

As mentioned - a rescan happily adds these to the existing album

In example 2 - I follow the same steps, with the only difference being that this time, when I load mp3tag to change the album tag on the new files, I highlight all the files and save the tags. This re-saves the existing tags to the existing files - even though none of them have actually changed. So now you see that the Date Modified is updated for ALL the files, but Date Created obviously stays the same for the original files Capture2

A rescan now creates the duplicate album issue Capture3

The original album with the original tracks Capture4

and the duplicate album with the additional tracks Capture5

They are both showing up as "New Music" so it's obviously changing the timestamp on the original album according to the date modified but why is it not adding the additional tracks as it does in example 1?

bobbydriver commented 2 years ago

To add to my confusion, I tried another test case

See example3 - here I don't add any new files to the folder, i just change an mp3 tag on existing track 1 (which changes the Date Modified on this one file only) Capture6

I was expecting a rescan to create a new album for that one file, with the rest of the tracks remain in the old album

But it doesn't?! Just updates the existing album (see the altered title tag on track 1) Capture7

So question now is - what is the difference between example 2 and example 3. Why does it behave differently to the Date Modified change?

michaelherger commented 2 years ago

Would you mind sharing your library.db (with the above duplicate albums in it!)?

https://www.dropbox.com/request/T3RctyzGgNg0oFDubq6a

Without the database it's hard to tell what's going on there.

bobbydriver commented 2 years ago

Will do - have tidied up the duplicates from yesterday so I will create a new test example and document for you, then upload my library.db and screenshots etc

bobbydriver commented 2 years ago

Hmm - i now seem to have corrupted my library and it's triggered a full rescan - not ideal!

On the positive side, I think I've narrowed down the exact circumstances in which the issue now occurs

Most of the error modes from older versions seem to have been fixed - which is great

While I wait to get my library back - can you try this

  1. Add a new track to an existing album folder
  2. Makes sure artist/year/genre tags are all the same BUT change the album tag for ALL tracks to be something new (most common example is changing it from "Album Name" to "Album Name [Expanded Edition]" )
  3. You should see the existing tracks still have their original Date Created but all the tracks will have a new Date Modified

Run a new/changed scan

What happens? For me I get 2 new albums created (one with the existing tracks and one with just the new track)

michaelherger commented 2 years ago

Thanks @bobbydriver! I received your files and will investigate. Can you confirm you're using the latest LMS 8.3?

michaelherger commented 2 years ago

Oh, I think I know what's going on: new tracks are scanned before the updated tracks. The new tracks therefore create a new album, because their album doesn't exist yet. Only once that's done the modified tracks would be updated. And as they already exist, the album referenced in the track would be updated, rather than the track linked to a new album. This causes the previous album to become a duplicate of the new one. That might become tricky to fix.

bobbydriver commented 2 years ago

Ah ok - that makes sense, not sure how you fix that. I guess if it scanned updated before new files that would bring it's own problems?

And yes - I am on the latest 8.3 nightly (if it matters)

michaelherger commented 2 years ago

Yes, changing the processing order is the most obvious approach I'll investigate first.

michaelherger commented 2 years ago

Please let me know should you encounter any new side-effects. Thanks for your help identifying this long-standing issue, @bobbydriver!

frank1969b commented 2 years ago

@mherger , GREAT You fixed this, too! This has been an evergreen either (to me it always happened if there was a new bonus edition of an album and i added the new bonus tracks to it - and this is often nowadays! :) ) THANKS!

bobbydriver commented 2 years ago

Thanks Michael! Testing it tonight. will let you know

bobbydriver commented 2 years ago

Looking good to me - the problem is gone. I didn't think this would ever get fixed so THANK YOU so much!

michaelherger commented 2 years ago

Good to know! Sometimes it needs a fresh mind to look into these old issues 😉.

schnillerman commented 2 years ago

Sorry to interrupt you again guys, but LMS 8.3.0 - 1644170574 @ Sun 06 Feb 2022 07:24:08 PM CET is still creating duplicates for me.

Use case: _Capitalization change in tag albumartist and dir/file name

Re-scan results in 2 identical entries, both with all tracks: image

One of the without display of artist Name: image

One with display of artist name: image

It seems that file name changes are registered as new files, too: image

Can share library.db if required.

mherger commented 2 years ago

Did you change artist name in tag, folder and file name all at the same time? I haven't tried all three at once yet.

Did you completely delete library.db (not just wipe its content) in the past week? Some of the new behaviour require some table schema to be updated/re-created from scratch.

Yes, I'd be interested in your library.db in its broken state: https://www.dropbox.com/request/T3RctyzGgNg0oFDubq6a

schnillerman commented 2 years ago

I did not delete library.db, however did a full re-scan before I changed the files as described above.

Just dropped the library.db.

Now re-scanning with all files named library.* renamed and LMS restarted (triggered a re-scan).

I did the following:

  1. re-tagged the files
  2. re-scanned
  3. duplicate entries were added
  4. renamed the files
  5. duplicate entries remained
michaelherger commented 2 years ago

Thanks for the uploaded file. As you confirmed it's not using the latest schema. It would still do case sensitive comparisons under certain circumstances.

schnillerman commented 2 years ago

I'll keep you updated as duplicates occur. For now, as the others already said: Huge thank you for dealing with this issue.