Deep-Symmetry / crate-digger

Java library for fetching and parsing rekordbox exports and track analysis files.
Other
137 stars 18 forks source link

History Entries table does not contain last played track, although rekordbox see it #32

Closed IvanOnishchenko closed 2 months ago

IvanOnishchenko commented 4 months ago

I use https://ide.kaitai.io/# and rekordbox_pdb.ksy to read export.pdb files.

I noticed that HISTORY_ENTRIES tabel of export.pdb has one less track than what I played, last track missing, but Recordbox sees all tracks.

brunchboy commented 4 months ago

Interesting. Please study your database, and figure out why this is happening, and submit a PR to correct the structure definition.

IvanOnishchenko commented 4 months ago

I've attached files in case anyone can help figure this out.

Hystory list from recordbox: https://drive.google.com/file/d/1-mSBPSp0x3Z_bD1z21wKkebytMMGN4Xh/view?usp=sharing Export.pdb from stick: https://drive.google.com/file/d/1ImR27xHuCVxFHyysDsBUKG0kghH-v9qr/view?usp=sharing

Thank you!

brunchboy commented 4 months ago

Please try to figure it out yourself! You are the person who is best positioned in the world to do so, because you have both the data, and a desire to see the change happen. That is what drives open-source projects. When I created this project, I knew nothing about Kaitai, and I learned it all as I went, so you can do the same.

brunchboy commented 4 months ago

Thank you for uploading that data, and for making it such a focused example of the problem. I had some spare time tonight, and poked into your files, and you have identified a misinterpretation we made in the structure format that is probably affecting a lot of tables. We had assumed num_rows represented the total valid row entries in a table, regardless of whether they had been marked as present or not. But it turns out that it seems to indicate the total number of remaining rows in the table after skipping the ones that have been marked as deleted. So we need to scan num_row_groups * 16 row entries, looking for all rows whose row_present_flag is full. I will assign this issue to myself and work on updating Crate Digger to reflect this improved understanding.

brunchboy commented 4 months ago

If you want to test the fix yourself in the Kaitai web IDE, you can change the repeat-expr in line 289 of rekordbox_psb.ksy to the much simpler value 16.

IvanOnishchenko commented 4 months ago

Wow! Now I see all tracks! Thank you!

Holzhaus commented 3 months ago

I have a question on this: Is the calculation of num_row_groups still correct? https://github.com/Deep-Symmetry/crate-digger/blob/main/src/main/kaitai/rekordbox_pdb.ksy#L243-L248

Or could there be any number of row groups, and we have to keep reading row groups it until we found all (via presence flags)?

brunchboy commented 3 months ago

I think num_row_groups is correct, you just need to scan all 16 entries in each of those row groups until all rows mentioned in num_rows have been found as present.

Holzhaus commented 2 months ago

I'm wondering because in the Database Exports Analysis you wrote:

Note: The row counter entries represent the number of actually-present rows in the page. To find them, you need to scan all 16 entries of each of the row groups present in the page, ignoring any whose row presence bit is zero.

I guess "row counter entries" refers to num_rows (i.e. num_rows_small or num_rows_large)? If so, this statement is quite confusing, because that would mean we check how many rows are present in a page, and then we start reading row groups until num_rows equals the total number of 1s in the row groups that were read.

Something like:

num_rows = 123
rows_found = 0
while rows_found < num_rows:
    rowgroup = read_next_rowgroup()
    rows_found += rowgroup.row_presence_flags.count_ones()

But this will not work (I checked). Either something is wrong with num_rows, or the statement is not really correct. The kaitai itself does not adhere to it, as far as I can tell by using the following files in the Kaitai Web IDE:

In this example, the workflow is as follows:

  1. According to num_rows there should be 7 existing rows (at least if I understand the note from the docs correctly)
  2. We calculate the number of row groups: $(n_{rows} - 1) / 16 + 1 = (7 - 1) / 16 + 1 = 6 / 16 + 1 = 0 + 1 = 1$ which means that there is only one rowgroup
  3. According to the row_presence_flags there are only 2 rows in the group

image

If the note is correct, where are the remaining 5 rows? I already checked, the other rows groups in that page do not contain any rows (it starts to fail after 45 rows groups).

brunchboy commented 2 months ago

I’m afraid it has been too many years since I was working at this level for the details to be fresh in my head. Do you have a sample export file that you can share that Crate Digger fails to find all the rows for a table with? Thinking about this now, I think you are probably right that we can’t use num_rows to figure out how many row groups there are, because of the fact that deleted rows might use up several row groups. So how are you determining how many row groups to scan?

brunchboy commented 2 months ago

Perhaps we need to just assume there are as many row groups as fit in the index page, and keep scanning until we find num_rows non-deleted rows? Sadly this is not going to be something that can be expressed coherently in Kaitai, I fear. Although we might be saved by the lazy nature of the row group evaluation.

Holzhaus commented 2 months ago

Perhaps we need to just assume there are as many row groups as fit in the index page, and keep scanning until we find num_rows non-deleted rows?

Nope, this is what I tried and it doesn't work. The export.pdb linked my comment above just contains the two demo tracks, but num_rows is 7. My initial implementation tried to read row groups until it finds 7 non-deleted rows, but since there are only two non-deleted rows it crashes at some point because it tries to interpret actual row data at the start of the heap as row group data.

Holzhaus commented 2 months ago

Just to clarify: the current kaitai code works. I'm just saying the corresponding docs are confusing and likely wrong.

My current interpretation of num_rows is that:

brunchboy commented 2 months ago

Yes, I think that is the best interpretation from the evidence. num_rows tells us how many rows have ever been allocated, which we can use to calculate num_row_groups, and we just need to scan all row groups for any whose presence is true.

brunchboy commented 2 months ago

Thanks for this follow up, I will re-update the documentation and implementation when I have some time. I will re-open this issue in the mean time.

brunchboy commented 2 months ago

Ok, I have fixed the documentation and the Kaitai mapping explanation. It seems my implementation was already working this way, but the reasoning behind the explanation was wrong. Thanks again for pointing this out, I have added credit in the change log!

brunchboy commented 2 months ago

(Oh, and please let me know if it is still confusing or could be further improved in your eyes!)