andrewrk / groovebasin

Music player server with a web-based user interface.
MIT License
1.89k stars 119 forks source link

ability for collection scanning to recover from errors when reading a file #383

Open j-vizcaino opened 9 years ago

j-vizcaino commented 9 years ago

When starting Groove Basin it scans your music folder to build the music collection. Problem: after startup is complete (and lots of messages about "max_analyze_duration 5000000 reached" and "Estimating duration from bitrate, this may be inaccurate"), my collection is missing a lot of songs. Probable cause:

library scanning error: Error: ENOENT, lstat '[...]/Classical/Maurice André/Albinoni - Concerto � cinque r� majeur 1.mp3'

Ok, this filename seems to be badly encoded but it seems the scanning process is aborting on the first error it encounters. A better solution would be to skip invalid entries and log them altogether at the end of the scanning process. All the problematic files would be logged in the same "log paragraph" (not scattered between warnings and such) and would be easily spotted by the user.

andrewrk commented 9 years ago

There is a problem to solve with this approach. Groove Basin detects file additions and deletions in your music library by scanning the music directory. Once scanning is complete, if any files are missing from disk, then they are removed from your music library and metadata such as play count and loudness is lost.

If Groove Basin ignored errors, then when there is a temporary error while scanning such as the system running out of resources, Groove Basin would think that much of your library was deleted. This happened on @thejoshwolfe's server.

Really this is a problem with Node.js - the file system API only deals with strings, but file names are byte arrays. It would be better if Groove Basin could still read the file even if the file name is badly encoded.

I'm open to other ideas, but merely ignoring errors is too dangerous. As a workaround, I suggest you move the problematic files into a temporary directory, restart the server, repeat, until you get no such errors. Then once Groove Basin is up and running you can import all those files via the web interface back into your library and Groove Basin will give them better names.

j-vizcaino commented 9 years ago

Yes, I did clean up my files and everything worked perfectly, thanks !

I was not suggesting to ignore errors, but rather group them at the end of the scanning process. Something more like : ok this file is broken, so keep it in a list and then print an error regarding all the broken files in my library.

I don't fully understand the problem with the scanning getting rid of entries. Is it like this : scanning is in progress, you get a fatal error, scanning aborts and returns an incomplete set of files, database handler gets rid of the "extra" entries ? If this is the case, how about keeping file + error inside the scan_error_list and not remove database entries contained in this list ? I guess that would be tricky if the error raises when trying to get directory listing though...

andrewrk commented 9 years ago

Is it like this : scanning is in progress, you get a fatal error, scanning aborts and returns an incomplete set of files, database handler gets rid of the "extra" entries ?

Yes that is what I meant.

how about keeping file + error inside the scan_error_list and not remove database entries contained in this list ?

I think this is a good idea. If we detect that it's an error with a specific file entry, then we can ignore that file and put it in the list of "broken" files that we can report to the user, and not remove them from the library. On the other hand, if an error occurs getting a directory listing, we can abort the scanning process and not do anything to the library.