cmu-db / bustub

The BusTub Relational Database Management System (Educational)
https://15445.courses.cs.cmu.edu
MIT License
4.01k stars 1.77k forks source link

bug: misues of INTERNAL_PAGE_SIZE in BPlusTreeInternalPage #562

Open Roscky opened 1 year ago

Roscky commented 1 year ago

The definition of INTERNAL_PAGE_SIZE in BPlusTreeInternalPage is correct, but passing INTERNAL_PAGE_SIZE directly in b_plus_tree.h is incorrect because ValueType in BPlusTree is not the same as the ValueType corresponding to BPlusTreeInternalPage. Therefore, the macro definition of INTERNAL_PAGE_SIZE here leads to incorrect calculation results, which can cause wasted space in internal nodes.

The macro definition of INTERNAL_PAGE_SIZE in b_plus_internal_tree.h.

#define INTERNAL_PAGE_SIZE ((BUSTUB_PAGE_SIZE - INTERNAL_PAGE_HEADER_SIZE) / (sizeof(MappingType)))

INTERNAL_PAGE_SIZE is used in b_plus_tree.h

explicit BPlusTree(std::string name, page_id_t header_page_id, BufferPoolManager *buffer_pool_manager,
                     const KeyComparator &comparator, int leaf_max_size = LEAF_PAGE_SIZE,
                     int internal_max_size = INTERNAL_PAGE_SIZE);

Since the ValueType of MappingType in b_plus_tree.h is RID, which is incorrect for InternalPage, the INTERNAL_PAGE_SIZE will be miscalculated.

infdahai commented 1 year ago

I think it doesn't cause the situation.

The macro INTERNAL_PAGE_SIZE is defined in b_plus_internal_tree.h.

// valuetype for internalNode should be page id_t
template class BPlusTreeInternalPage<GenericKey<4>, page_id_t, GenericComparator<4>>;
template class BPlusTreeInternalPage<GenericKey<8>, page_id_t, GenericComparator<8>>;

But the template on b_plust_tree.h use the page_id_t type and ValueType is setted as page_id_t. So INTERNAL_PAGE_SIZE can be calculated correctly.

And internal_max_size_ and leaf_max_size_ will be setted based on page type during btree NewPage in b_plus_tree.h. btw, macros are indeed easily misunderstood.

infdahai commented 1 year ago
[bustub-private/src/storage/index/b_plus_tree.cpp:22:BPlusTree] INFO  - Value_size:8
PageId_size:4
INTERNAL_PAGE_SIZE:254

[bustub-private/src/storage/page/b_plus_tree_leaf_page.cpp:38:Init] INFO  - Value_size:8
Mapping_size:16
LEAF_PAGE_SIZE:254

[bustub-private/src/storage/page/b_plus_tree_internal_page.cpp:37:Init] INFO  - Value_size:4
Mapping_size:12
INTERNAL_PAGE_SIZE:339

I add print statements and find that INTERNAL_PAGE_SIZE is right in b_plus_tree_internal_page.cpp but wrong in b_plus_tree.h . so the error occurs that causes INTERNAL_PAGE_SIZE is same to LEAF_PAGE_SIZE.

we can try to fix this in #517 and #547

skyzh commented 1 year ago

Actually 254 should be the correct max size, because we need to align MappingType. The computation is wrong, but accidentally gets the correct result.

Roscky commented 1 year ago

When the KeyType takes 8 bytes, the sizeof(MappingType) == 16 is correct. However, when the KeyType takes 1/2/4 bytes, the MappingType of internal node should align 4 bytes since the sizeof(KeyType) <= 4 and the sizeof(ValueType) == 4, so the sizeof(MappingType) == 8 and the internal_max_size should be more than 254.

skyzh commented 1 year ago

Yes, that’s something we should fix.