Closed IvanOnishchenko closed 2 months ago
Interesting. Please study your database, and figure out why this is happening, and submit a PR to correct the structure definition.
I've attached files in case anyone can help figure this out.
Hystory list from recordbox: https://drive.google.com/file/d/1-mSBPSp0x3Z_bD1z21wKkebytMMGN4Xh/view?usp=sharing Export.pdb from stick: https://drive.google.com/file/d/1ImR27xHuCVxFHyysDsBUKG0kghH-v9qr/view?usp=sharing
Thank you!
Please try to figure it out yourself! You are the person who is best positioned in the world to do so, because you have both the data, and a desire to see the change happen. That is what drives open-source projects. When I created this project, I knew nothing about Kaitai, and I learned it all as I went, so you can do the same.
Thank you for uploading that data, and for making it such a focused example of the problem. I had some spare time tonight, and poked into your files, and you have identified a misinterpretation we made in the structure format that is probably affecting a lot of tables. We had assumed num_rows
represented the total valid row entries in a table, regardless of whether they had been marked as present or not. But it turns out that it seems to indicate the total number of remaining rows in the table after skipping the ones that have been marked as deleted. So we need to scan num_row_groups * 16
row entries, looking for all rows whose row_present_flag
is full. I will assign this issue to myself and work on updating Crate Digger to reflect this improved understanding.
If you want to test the fix yourself in the Kaitai web IDE, you can change the repeat-expr
in line 289 of rekordbox_psb.ksy
to the much simpler value 16
.
Wow! Now I see all tracks! Thank you!
I have a question on this: Is the calculation of num_row_groups
still correct? https://github.com/Deep-Symmetry/crate-digger/blob/main/src/main/kaitai/rekordbox_pdb.ksy#L243-L248
Or could there be any number of row groups, and we have to keep reading row groups it until we found all (via presence flags)?
I think num_row_groups
is correct, you just need to scan all 16 entries in each of those row groups until all rows mentioned in num_rows
have been found as present.
I'm wondering because in the Database Exports Analysis you wrote:
Note: The row counter entries represent the number of actually-present rows in the page. To find them, you need to scan all 16 entries of each of the row groups present in the page, ignoring any whose row presence bit is zero.
I guess "row counter entries" refers to num_rows
(i.e. num_rows_small
or num_rows_large
)? If so, this statement is quite confusing, because that would mean we check how many rows are present in a page, and then we start reading row groups until num_rows
equals the total number of 1
s in the row groups that were read.
Something like:
num_rows = 123
rows_found = 0
while rows_found < num_rows:
rowgroup = read_next_rowgroup()
rows_found += rowgroup.row_presence_flags.count_ones()
But this will not work (I checked). Either something is wrong with num_rows
, or the statement is not really correct. The kaitai itself does not adhere to it, as far as I can tell by using the following files in the Kaitai Web IDE:
In this example, the workflow is as follows:
num_rows
there should be 7 existing rows (at least if I understand the note from the docs correctly)row_presence_flags
there are only 2 rows in the groupIf the note is correct, where are the remaining 5 rows? I already checked, the other rows groups in that page do not contain any rows (it starts to fail after 45 rows groups).
I’m afraid it has been too many years since I was working at this level for the details to be fresh in my head. Do you have a sample export file that you can share that Crate Digger fails to find all the rows for a table with? Thinking about this now, I think you are probably right that we can’t use num_rows
to figure out how many row groups there are, because of the fact that deleted rows might use up several row groups. So how are you determining how many row groups to scan?
Perhaps we need to just assume there are as many row groups as fit in the index page, and keep scanning until we find num_rows
non-deleted rows? Sadly this is not going to be something that can be expressed coherently in Kaitai, I fear. Although we might be saved by the lazy nature of the row group evaluation.
Perhaps we need to just assume there are as many row groups as fit in the index page, and keep scanning until we find
num_rows
non-deleted rows?
Nope, this is what I tried and it doesn't work. The export.pdb
linked my comment above just contains the two demo tracks, but num_rows
is 7. My initial implementation tried to read row groups until it finds 7 non-deleted rows, but since there are only two non-deleted rows it crashes at some point because it tries to interpret actual row data at the start of the heap as row group data.
Just to clarify: the current kaitai code works. I'm just saying the corresponding docs are confusing and likely wrong.
My current interpretation of num_rows
is that:
num_rows
is the number of rows in the table, but that does not mean that these rows are actually present (I think that was also what the docs said before this change).Yes, I think that is the best interpretation from the evidence. num_rows
tells us how many rows have ever been allocated, which we can use to calculate num_row_groups
, and we just need to scan all row groups for any whose presence is true.
Thanks for this follow up, I will re-update the documentation and implementation when I have some time. I will re-open this issue in the mean time.
Ok, I have fixed the documentation and the Kaitai mapping explanation. It seems my implementation was already working this way, but the reasoning behind the explanation was wrong. Thanks again for pointing this out, I have added credit in the change log!
(Oh, and please let me know if it is still confusing or could be further improved in your eyes!)
I use https://ide.kaitai.io/# and rekordbox_pdb.ksy to read export.pdb files.
I noticed that HISTORY_ENTRIES tabel of export.pdb has one less track than what I played, last track missing, but Recordbox sees all tracks.