Zeugma440 / atldotnet

Fully managed, portable and easy-to-use C# library to read and edit audio data and metadata (tags) from various audio formats, playlists and CUE sheets
MIT License
442 stars 60 forks source link

Possible problem in BufferedBinaryReader #193

Closed JoGillis closed 1 year ago

JoGillis commented 1 year ago

The problem

I use the ATL library mainly to help maintaining my vast audio collection consisting of MP3 and WMA files. When using your library more and more to get more information, I came to a point where some tags of some WMA files seemed not correct (title reverting to the filename, Album art not found). First, I suspected the WMA files themselves, however other tools (mp3 tageditor, media player) did read them correctly. I've seen this already before and sometimes depends on the effort put in the recovery code from small mistakes inside the files (was able, not so long ago, to correct such errors inside MP3 files). To see what happened I downloaded your latest codebase and tried to run it on my computer from within a small test program on one of the files with the problem and the result was equal (always a good start to be able to reproduce it...). Long story short (see details), I modified your code a lot to be able to understand what was going on, and at this moment I suspect there is a problem with the BufferedBinaryReader...

Environment

Details

Steps taken:

INFO - Trying to read a field ('WM/SharedUserRating'-4, type='3') At position 17B3 
INFO - Trying to read a field ('WM/TrackNumber'-4, type='3') At position 17E5 
WARNING - Lost sync? current position= 180F and should be 180D 
INFO - Trying to read a field (''-87, type='21') At position 180D

Correct behavior:

INFO - Trying to read a field ('WM/SharedUserRating'-4, type='3') At position 17B3 
INFO - Trying to read a field at position 17E5 
INFO - Trying to read a field ('WM/TrackNumber'-4, type='3') At position 17E5 
INFO - Trying to read a field at position 180D 
INFO - Trying to read a field ('WM/Picture'-10088, type='1') At position 180D 
INFO - Trying to read a field at position 3F91 
INFO - Trying to read a field ('IsVBR'-4, type='2') At position 3F91 

the positions are the (real) stream positions in the file (and the buffer), but in the first case after reading the tracknumber property it is 2 bytes past where it should be according to the calculation and the seek at the end of the loop. This seek resets it to the position where it should be (also the correct position in the file) to read the next one, but this then fails because the correct data in the buffer seems to start at the 180F position...

How did i get the 2nd log? Well in suspecting something with the buffer, the easiest way was to start playing with his properties. Just with the first try, I doubled the default buffer size with the above result... However, now it seems the problem occurs later in the chain...

WARNING - Lost sync? current position= 3FA5 and should be 3FA7
INFO - Trying to read a field at position 3FA7

And yes, the switch between the tracknumber and picture properties is just on one of the 512 byte boundaries since the beginning of the tags (Actually it is the first call to the ReadUInt16 for the length of the fieldname that returns the first wrong result, but it must have been the call to ReadUInt32 to get the value of the tracknumber where the pointer must have been corrupted).

The easiest solution for my problem would be recompiling with a buffer size bigger than an expected tag piece of the file, but I expect this behaviour could give some problems also on other places in the code. I do not have the time right now to deeper investigate (and get to know) the buffered reader (although i plan to do it some time, and i will keep you posted if i make progress...), but I hope you will be able to find some time and help me with this ;)

If there is something I can do to help, let me know, I will try to squeeze it in...

Zeugma440 commented 1 year ago

Hi and thanks for your feedback πŸ‘

To be perfectloy honest, this all got me a little confused as you're posting specific traces from your own fork, using offsets and values from your own file. I'm afraid I can't do much with that on my end.

To be able to diagnose the issue and suggest a fix, I'll need :

Thanks in advance~

JoGillis commented 1 year ago

Yea, maybe you are right, sorry

I will attach one of the files i tested on (remember no error will be produced in your original lib - only a few properties in the file will not be detected correctly for exmple the cover art), and my modified version of wma.cs which provided the extra info.

The test code is just creating a Track from the file...

Hope this will help TestData.zip

JoGillis commented 1 year ago

Bummer (for me) at this moment, the BufferedBinaryReader does not look to the settings at this moment to specify the buffer size...

Zeugma440 commented 1 year ago

Your diagnosis was correct, but the solution definitely wasn't to tamper with buffer size πŸ˜…

I found the issue and have published the fix in today's v4.30

JoGillis commented 1 year ago

Thanks for the quick fix, now I can go back to using the nuget package (btw, adapting the buffersize was only a workaround from my side to avoid updating the buffer during the operation which would result in correct info...)

still have 3 smaller issues, and if you find them valid enough I can put them in different issues if you want:

  1. The BufferedBinaryReader still does not use the buffersize setting, if it is intentional, the doc on the setting should tell so, but i would still prefer that it would use the setting.
  2. I have been experimenting with an adapted class for reading wma info, which can also detect more errors inside the structure of the file itself (As you also seem to do in the mp3 part of the code - you have a very early version of that in the zip file), would you like to incorporate the changes in your project, then i will clean it up, enhance it and send it to you?
  3. Wma does support the MCDI property as does ID3V2 (where you have made definitions for that field) however it is now completely ignored, and I frankly have no idea what the code now does when writing properties (also the ASFLEAKYBUCKETPAIRS field is completely ignored while reading), so my question, will writing work as expected or will those field be removed on writing? I currently have no intentions yet to write properties with the library, but who knows somewhere in the future?

But no pressure on all this...

Later today, or tomorrow i will run your bugfix against my library of 20000 files, and i'll post the result here...

Zeugma440 commented 1 year ago
  1. I think you're right - BufferedBinaryReader should use that setting. I've made the change; it will be available on next release

  2. You're welcome to submit a PR for that, sure. Please keep performance in mind at all times when writing your adjustments.

3a.

Wma does support the MCDI property as does ID3V2

Err... where do you get that information from ? πŸ˜… MCDI is a standard ID3v2 property, but it doesn't appear anywhere on the documentation I have regarding ASF/WMA. I guess you could write such a tag to a WMA file, but it wouldn't follow the standard.

3b.

also the ASFLEAKYBUCKETPAIRS field is completely ignored while reading

Right now, WMA properties of type 1 (byte array) and 6 (128-bit GUID) are ignored, except for the embedded picture... and it indeed causes loss of data for these fields when rewriting the file. Could you create a new GitHub issue for that, please ?

JoGillis commented 1 year ago

I think your fix resulted in another error on some files. I get the following error:

Exception thrown: 'System.ArgumentOutOfRangeException' in mscorlib.dll Non-negative number required. Parameter name: value at System.IO.FileStream.set_Position(Int64 value) at ATL.BufferedBinaryReader.Seek(Int64 offset, SeekOrigin origin) in D:\x\atldotnet-main\ATL\Utils\BufferedBinaryReader.cs:line 154 at ATL.AudioData.IO.WMA.readData(Stream source, ReadTagParams readTagParams) in D:\x\atldotnet-main\ATL\AudioData\IO\WMA.cs:line 776 at ATL.AudioData.IO.WMA.read(Stream source, ReadTagParams readTagParams) in D:\x\atldotnet-main\ATL\AudioData\IO\WMA.cs:line 825 at ATL.AudioData.IO.WMA.Read(Stream source, SizeInfo sizeInfo, ReadTagParams readTagParams) in D:\x\atldotnet-main\ATL\AudioData\IO\WMA.cs:line 817 at ATL.AudioData.AudioDataManager.read(Stream source, ReadTagParams readTagParams) in D:\x\atldotnet-main\ATL\AudioData\AudioDataManager.cs:line 572 at ATL.AudioData.AudioDataManager.read(Stream source, Boolean readEmbeddedPictures, Boolean readAllMetaFrames, Boolean prepareForWriting) in D:\x\atldotnet-main\ATL\AudioData\AudioDataManager.cs:line 549 at ATL.AudioData.AudioDataManager.ReadFromFile(Boolean readEmbeddedPictures, Boolean readAllMetaFrames) in D:\x\atldotnet-main\ATL\AudioData\AudioDataManager.cs:line 330

Changing the buffersize seems to result in different outcome:

2023-04-17 13:18:18.527 - MSGS - TestAtl.MySupport - LogInitialisation - Starting TestAtl - TestAtl Program V1.0.0.0 2023-04-17 13:18:18.527 - MSGS - TestAtl.Form1 - Button1_Click - Testing with buffer size: 512 2023-04-17 13:18:24.036 - WARN - TestAtl.Form1 - DoLog - WARNING - Have a not recognized object here '91 7 DC B7 B7 A9 8E E6 0 C0 C 20 53 65 72 0 (Unknown)'; size='11403114256502095872' - x04 Unbroken Road.wma 2023-04-17 13:18:24.098 - ERRM - TestAtl.Form1 - DoLog - ERROR - Non-negative number required. Parameter name: value - x04 Unbroken Road.wma 2023-04-17 13:18:24.098 - ERRM - TestAtl.Form1 - DoLog - ERROR - at System.IO.FileStream.set_Position(Int64 value) at ATL.BufferedBinaryReader.Seek(Int64 offset, SeekOrigin origin) in x\atldotnet-main\ATL\Utils\BufferedBinaryReader.cs:line 154 at ATL.AudioData.IO.WMA.readData(Stream source, ReadTagParams readTagParams) in x\atldotnet-main\ATL\AudioData\IO\WMA.cs:line 776 at ATL.AudioData.IO.WMA.read(Stream source, ReadTagParams readTagParams) in x\atldotnet-main\ATL\AudioData\IO\WMA.cs:line 825 at ATL.AudioData.IO.WMA.Read(Stream source, SizeInfo sizeInfo, ReadTagParams readTagParams) in x\atldotnet-main\ATL\AudioData\IO\WMA.cs:line 817 at ATL.AudioData.AudioDataManager.read(Stream source, ReadTagParams readTagParams) in x\atldotnet-main\ATL\AudioData\AudioDataManager.cs:line 572 at ATL.AudioData.AudioDataManager.read(Stream source, Boolean readEmbeddedPictures, Boolean readAllMetaFrames, Boolean prepareForWriting) in x\atldotnet-main\ATL\AudioData\AudioDataManager.cs:line 549 at ATL.AudioData.AudioDataManager.ReadFromFile(Boolean readEmbeddedPictures, Boolean readAllMetaFrames) in x\atldotnet-main\ATL\AudioData\AudioDataManager.cs:line 330 - x04 Unbroken Road.wma 2023-04-17 13:18:24.098 - WARN - TestAtl.Form1 - DoLog - WARNING - Unrecognized file extension : x04 Unbroken Road.wma - x04 Unbroken Road.wma 2023-04-17 13:18:24.103 - WARN - TestAtl.Form1 - DoLog - WARNING - Have a not recognized object here '91 7 DC B7 B7 A9 8E E6 0 C0 C 20 53 65 72 0 (Unknown)'; size='11403114256502095872' - in-memory 2023-04-17 13:18:24.124 - ERRM - TestAtl.Form1 - DoLog - ERROR - Non-negative number required. Parameter name: value - in-memory 2023-04-17 13:18:24.124 - ERRM - TestAtl.Form1 - DoLog - ERROR - at System.IO.FileStream.set_Position(Int64 value) at ATL.BufferedBinaryReader.Seek(Int64 offset, SeekOrigin origin) in x\atldotnet-main\ATL\Utils\BufferedBinaryReader.cs:line 154 at ATL.AudioData.IO.WMA.readData(Stream source, ReadTagParams readTagParams) in x\atldotnet-main\ATL\AudioData\IO\WMA.cs:line 776 at ATL.AudioData.IO.WMA.read(Stream source, ReadTagParams readTagParams) in x\atldotnet-main\ATL\AudioData\IO\WMA.cs:line 825 at ATL.AudioData.IO.WMA.Read(Stream source, SizeInfo sizeInfo, ReadTagParams readTagParams) in x\atldotnet-main\ATL\AudioData\IO\WMA.cs:line 817 at ATL.AudioData.AudioDataManager.read(Stream source, ReadTagParams readTagParams) in x\atldotnet-main\ATL\AudioData\AudioDataManager.cs:line 572 at ATL.AudioData.AudioDataManager.read(Stream source, Boolean readEmbeddedPictures, Boolean readAllMetaFrames, Boolean prepareForWriting) in x\atldotnet-main\ATL\AudioData\AudioDataManager.cs:line 549 at ATL.AudioData.AudioDataManager.ReadFromFile(Boolean readEmbeddedPictures, Boolean readAllMetaFrames) in x\atldotnet-main\ATL\AudioData\AudioDataManager.cs:line 330 - in-memory 2023-04-17 13:18:24.143 - WARN - TestAtl.Form1 - DoLog - WARNING - Have a not recognized object here '91 7 DC B7 B7 A9 8E E6 0 C0 C 20 53 65 72 0 (Unknown)'; size='11403114256502095872' - x04 Unbroken Road.wma 2023-04-17 13:18:24.162 - ERRM - TestAtl.Form1 - DoLog - ERROR - Non-negative number required. Parameter name: value - x04 Unbroken Road.wma 2023-04-17 13:18:24.163 - ERRM - TestAtl.Form1 - DoLog - ERROR - at System.IO.FileStream.set_Position(Int64 value) at ATL.BufferedBinaryReader.Seek(Int64 offset, SeekOrigin origin) in x\atldotnet-main\ATL\Utils\BufferedBinaryReader.cs:line 154 at ATL.AudioData.IO.WMA.readData(Stream source, ReadTagParams readTagParams) in x\atldotnet-main\ATL\AudioData\IO\WMA.cs:line 776 at ATL.AudioData.IO.WMA.read(Stream source, ReadTagParams readTagParams) in x\atldotnet-main\ATL\AudioData\IO\WMA.cs:line 825 at ATL.AudioData.IO.WMA.Read(Stream source, SizeInfo sizeInfo, ReadTagParams readTagParams) in x\atldotnet-main\ATL\AudioData\IO\WMA.cs:line 817 at ATL.AudioData.AudioDataManager.read(Stream source, ReadTagParams readTagParams) in x\atldotnet-main\ATL\AudioData\AudioDataManager.cs:line 572 at ATL.AudioData.AudioDataManager.read(Stream source, Boolean readEmbeddedPictures, Boolean readAllMetaFrames, Boolean prepareForWriting) in x\atldotnet-main\ATL\AudioData\AudioDataManager.cs:line 549 at ATL.AudioData.AudioDataManager.ReadFromFile(Boolean readEmbeddedPictures, Boolean readAllMetaFrames) in x\atldotnet-main\ATL\AudioData\AudioDataManager.cs:line 330 - x04 Unbroken Road.wma 2023-04-17 13:18:24.163 - WARN - TestAtl.Form1 - DoLog - WARNING - Unrecognized file extension : x04 Unbroken Road.wma - x04 Unbroken Road.wma 2023-04-17 13:18:24.164 - WARN - TestAtl.Form1 - DoLog - WARNING - Have a not recognized object here '91 7 DC B7 B7 A9 8E E6 0 C0 C 20 53 65 72 0 (Unknown)'; size='11403114256502095872' - in-memory 2023-04-17 13:18:24.186 - ERRM - TestAtl.Form1 - DoLog - ERROR - Non-negative number required. Parameter name: value - in-memory 2023-04-17 13:18:24.187 - ERRM - TestAtl.Form1 - DoLog - ERROR - at System.IO.FileStream.set_Position(Int64 value) at ATL.BufferedBinaryReader.Seek(Int64 offset, SeekOrigin origin) in x\atldotnet-main\ATL\Utils\BufferedBinaryReader.cs:line 154 at ATL.AudioData.IO.WMA.readData(Stream source, ReadTagParams readTagParams) in x\atldotnet-main\ATL\AudioData\IO\WMA.cs:line 776 at ATL.AudioData.IO.WMA.read(Stream source, ReadTagParams readTagParams) in x\atldotnet-main\ATL\AudioData\IO\WMA.cs:line 825 at ATL.AudioData.IO.WMA.Read(Stream source, SizeInfo sizeInfo, ReadTagParams readTagParams) in x\atldotnet-main\ATL\AudioData\IO\WMA.cs:line 817 at ATL.AudioData.AudioDataManager.read(Stream source, ReadTagParams readTagParams) in x\atldotnet-main\ATL\AudioData\AudioDataManager.cs:line 572 at ATL.AudioData.AudioDataManager.read(Stream source, Boolean readEmbeddedPictures, Boolean readAllMetaFrames, Boolean prepareForWriting) in x\atldotnet-main\ATL\AudioData\AudioDataManager.cs:line 549 at ATL.AudioData.AudioDataManager.ReadFromFile(Boolean readEmbeddedPictures, Boolean readAllMetaFrames) in x\atldotnet-main\ATL\AudioData\AudioDataManager.cs:line 330 - in-memory 2023-04-17 13:18:24.189 - MSGS - TestAtl.Form1 - Button1_Click - Title : 04 Unbroken Road 2023-04-17 13:18:24.189 - MSGS - TestAtl.Form1 - Button1_Click - Album : Skyrim: The Elder Scrolls 2023-04-17 13:18:24.189 - MSGS - TestAtl.Form1 - Button1_Click - Artist: Jeremy Soule 2023-04-17 13:18:24.189 - MSGS - TestAtl.Form1 - Button1_Click - Rating: 80 2023-04-17 13:18:24.189 - MSGS - TestAtl.Form1 - Button1_Click - Track : 4 2023-04-17 13:18:24.190 - MSGS - TestAtl.Form1 - Button1_Click - HasArt: True 2023-04-17 13:18:24.190 - MSGS - TestAtl.Form1 - Button1_Click - Genre : Soundtrack 2023-04-17 13:18:24.190 - MSGS - TestAtl.Form1 - Button1_Click - Field: WM/MediaClassSecondaryID - Value: '' 2023-04-17 13:18:24.190 - MSGS - TestAtl.Form1 - Button1_Click - Field: WM/MediaClassPrimaryID - Value: '' 2023-04-17 13:18:24.190 - MSGS - TestAtl.Form1 - Button1_Click - Field: WMFSDKVersion - Value: '12.0.7601.17514' 2023-04-17 13:18:24.190 - MSGS - TestAtl.Form1 - Button1_Click - Field: WMFSDKNeeded - Value: '0.0.0.0000' 2023-04-17 13:18:24.190 - MSGS - TestAtl.Form1 - Button1_Click - Field: ASFLeakyBucketPairs - Value: '' 2023-04-17 13:18:24.190 - MSGS - TestAtl.Form1 - Button1_Click - Field: TRACKTOTAL - Value: '18' 2023-04-17 13:18:24.190 - MSGS - TestAtl.Form1 - Button1_Click - Field: DISCTOTAL - Value: '4' 2023-04-17 13:18:24.190 - MSGS - TestAtl.Form1 - Button1_Click - Field: WM/ISRC - Value: 'QMVAB1100004' 2023-04-17 13:18:24.190 - MSGS - TestAtl.Form1 - Button1_Click - Field: iTunes_CDDB_1 - Value: '040ECD12+284364+18+150+17973+25091+48042+77006+98571+115676+128668+139389+147701+164475+178162+205644+223260+241245+256384+264259+273651' 2023-04-17 13:18:24.190 - MSGS - TestAtl.Form1 - Button1_Click - Field: WM/Track - Value: '3' 2023-04-17 13:18:24.190 - MSGS - TestAtl.Form1 - Button1_Click - Field: WM/Codec - Value: 'WMA Pro 10' 2023-04-17 13:18:24.190 - MSGS - TestAtl.Form1 - Button1_Click - Field: WM/EncodingTime - Value: '129767187360000000' 2023-04-17 13:18:24.190 - MSGS - TestAtl.Form1 - Button1_Click - Field: AverageLevel - Value: '3451' 2023-04-17 13:18:24.190 - MSGS - TestAtl.Form1 - Button1_Click - Field: PeakValue - Value: '30241' 2023-04-17 13:18:24.190 - MSGS - TestAtl.Form1 - Button1_Click - Field: IsVBR - Value: '1' 2023-04-17 13:18:24.190 - MSGS - TestAtl.Form1 - Button1_Click - Field: MediaFoundationVersion - Value: '2.112' 2023-04-17 13:18:27.283 ---------------------- Closing Log File --------------------------- 2023-04-17 13:18:43.554 - MSGS - TestAtl.MySupport - LogInitialisation - Starting TestAtl - TestAtl Program V1.0.0.0 2023-04-17 13:18:43.554 - MSGS - TestAtl.Form1 - Button1_Click - Testing with buffer size: 20480 2023-04-17 13:18:43.654 - MSGS - TestAtl.Form1 - Button1_Click - Title : Unbroken Road 2023-04-17 13:18:43.654 - MSGS - TestAtl.Form1 - Button1_Click - Album : Skyrim: The Elder Scrolls 2023-04-17 13:18:43.654 - MSGS - TestAtl.Form1 - Button1_Click - Artist: Jeremy Soule 2023-04-17 13:18:43.654 - MSGS - TestAtl.Form1 - Button1_Click - Rating: 80 2023-04-17 13:18:43.654 - MSGS - TestAtl.Form1 - Button1_Click - Track : 4 2023-04-17 13:18:43.654 - MSGS - TestAtl.Form1 - Button1_Click - HasArt: True 2023-04-17 13:18:43.654 - MSGS - TestAtl.Form1 - Button1_Click - Genre : Soundtrack 2023-04-17 13:18:43.654 - MSGS - TestAtl.Form1 - Button1_Click - Field: WM/MediaClassSecondaryID - Value: '' 2023-04-17 13:18:43.654 - MSGS - TestAtl.Form1 - Button1_Click - Field: WM/MediaClassPrimaryID - Value: '' 2023-04-17 13:18:43.654 - MSGS - TestAtl.Form1 - Button1_Click - Field: WMFSDKVersion - Value: '12.0.7601.17514' 2023-04-17 13:18:43.654 - MSGS - TestAtl.Form1 - Button1_Click - Field: WMFSDKNeeded - Value: '0.0.0.0000' 2023-04-17 13:18:43.654 - MSGS - TestAtl.Form1 - Button1_Click - Field: ASFLeakyBucketPairs - Value: '' 2023-04-17 13:18:43.654 - MSGS - TestAtl.Form1 - Button1_Click - Field: TRACKTOTAL - Value: '18' 2023-04-17 13:18:43.654 - MSGS - TestAtl.Form1 - Button1_Click - Field: DISCTOTAL - Value: '4' 2023-04-17 13:18:43.654 - MSGS - TestAtl.Form1 - Button1_Click - Field: WM/ISRC - Value: 'QMVAB1100004' 2023-04-17 13:18:43.654 - MSGS - TestAtl.Form1 - Button1_Click - Field: iTunes_CDDB_1 - Value: '040ECD12+284364+18+150+17973+25091+48042+77006+98571+115676+128668+139389+147701+164475+178162+205644+223260+241245+256384+264259+273651' 2023-04-17 13:18:43.654 - MSGS - TestAtl.Form1 - Button1_Click - Field: WM/Track - Value: '3' 2023-04-17 13:18:43.654 - MSGS - TestAtl.Form1 - Button1_Click - Field: WM/Codec - Value: 'WMA Pro 10' 2023-04-17 13:18:43.655 - MSGS - TestAtl.Form1 - Button1_Click - Field: WM/EncodingTime - Value: '129767187360000000' 2023-04-17 13:18:43.655 - MSGS - TestAtl.Form1 - Button1_Click - Field: AverageLevel - Value: '3451' 2023-04-17 13:18:43.655 - MSGS - TestAtl.Form1 - Button1_Click - Field: PeakValue - Value: '30241' 2023-04-17 13:18:43.655 - MSGS - TestAtl.Form1 - Button1_Click - Field: IsVBR - Value: '1' 2023-04-17 13:18:43.655 - MSGS - TestAtl.Form1 - Button1_Click - Field: MediaFoundationVersion - Value: '2.112' 2023-04-17 13:18:46.055 ---------------------- Closing Log File ---------------------------

JoGillis commented 1 year ago

04 Unbroken Road.zip

Zeugma440 commented 1 year ago

You're right, the fix wasn't thorough. My apologies.

Today's v4.31 should do the trick.

JoGillis commented 1 year ago

Tested against my Library, No strange behaviour encountered, As promised rest will come later when i have some time to test thoroughly.

Thanks

Zeugma440 commented 1 year ago

also the ASFLEAKYBUCKETPAIRS field is completely ignored while reading

Right now, WMA properties of type 1 (byte array) and 6 (128-bit GUID) are ignored, except for the embedded picture... and it indeed causes loss of data for these fields when rewriting the file. Could you create a new GitHub issue for that, please ?

Next version will store all type 1 and type 6 fields (such as WM/MediaClassPrimaryID or ASFLeakyBucketPair) inside Track.AdditionalFields, as Base64-encoded strings. That way, no data will be lost and you'll even be able to edit them πŸ‘