[BUG] num_samples 向下去整, 防止prefrech预取时候超过数据集最大长度...

PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.

Apache License 2.0

12k stars 2.93k forks source link

# 该情况下，计算存在问题，当向上去整的时候2848会超过数据集的最大长度2844 len(self.dataset) = 2844 self.nranks = 8 int( len(self.dataset)* 1.0 / self.nranks) * self.nranks = 2840 int(ceil(len(self.dataset)* 1.0 / self.nranks)) * self.nranks = 2848

# 该情况下计算不会有问题，因为整除了 len(self.dataset) = 2844 self.nranks = 4 int( len(self.dataset)* 1.0 / self.nranks) * self.nranks = 2844 int(ceil(len(self.dataset)* 1.0 / self.nranks)) * self.nranks = 2844

Codecov Report

Attention: Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Project coverage is 55.61%. Comparing base (2723138) to head (a2094dc). Report is 230 commits behind head on develop.

Files with missing lines	Patch %	Lines
paddlenlp/utils/batch_sampler.py	0.00%	2 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## develop #8690 +/- ## =========================================== - Coverage 55.61% 55.61% -0.01% =========================================== Files 620 620 Lines 96965 96964 -1 =========================================== - Hits 53930 53929 -1 Misses 43035 43035 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

PaddlePaddle / PaddleNLP

[BUG] num_samples 向下去整, 防止prefrech预取时候超过数据集最大长度... #8690

PR types

PR changes

Description

Codecov Report