Closed Oufattole closed 2 months ago
The update introduces new functionality and improves existing logic in the pytorch_dataset.py
by adding global variables, new functions for dataset merging and index calculation, and refining the dataset initialization and batch collation processes. The test file test_dataloader.py
sees expanded imports and more detailed assertions, enhancing test coverage and reliability.
File | Change Summary |
---|---|
.../pytorch_dataset.py |
Added IDX_COL variable, functions (merge_task_with_static , get_task_indexes ), refactored dataset initialization, and revised collate_triplet method. |
.../test_dataloader.py |
Added imports for datetime and polars , enhanced assertions in test_triplet , and expanded test_task with task dataframe setup and assertions. |
sequenceDiagram
participant DatasetLoader
participant DataFrame
participant TaskDataFrame
DatasetLoader->>+DataFrame: Load static data
DatasetLoader->>+TaskDataFrame: Load task data
DatasetLoader->>DataFrame: merge_task_with_static(TaskDataFrame)
DatasetLoader->>DatasetLoader: Initialize dataset with merged data
loop On Batch Creation
DatasetLoader->>DatasetLoader: get_task_indexes()
DatasetLoader->>DatasetLoader: collate_triplet()
DatasetLoader->>DatasetLoader: Include 'label' tensor in batch
end
In the world of code so bright, New functions come to light, Tasks merge with grace, Collate with a label’s embrace, Tests stand tall, their might. 🚀🐇
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?
Added functionality for pulling the spans of data for a pre-defined task. Tests for these functions are included as well. This addresses issue #2.
Summary by CodeRabbit
New Features
Tests
test_triplet
andtest_task
.