数据集太大 - Githubissues

Y-debug-sys / Diffusion-TS

[ICLR 2024] Official Implementation of "Diffusion-TS: Interpretable Diffusion for General Time Series Generation"

MIT License

184 stars 26 forks source link

数据集太大 #39

Closed HIUZS closed 4 months ago

HIUZS commented 4 months ago

作者你好，由于我的数据集过大，导致无法运行，并产生下面的错误。请问能否将数据集分成多个.csv文件来进行读取，如何可以的话能否告知在哪修改

x = np.zeros((self.sample_num_total, self.window, self.var_num)) numpy.core._exceptions.MemoryError: Unable to allocate 137. GiB for an array with shape (7174401, 1280, 2) and data type float64

Y-debug-sys commented 4 months ago

你好，可以换滑动窗口步长。请问上一个问题解决了吗？

Y-debug-sys commented 4 months ago

比如把 Utils/Data_utils/real_datasets.py 的line 62 ~ 67

def __getsamples(self, data, proportion, seed):
    x = np.zeros((self.sample_num_total, self.window, self.var_num))
    for i in range(self.sample_num_total):
        start = i
        end = i + self.window
        x[i, :, :] = data[start:end, :]

改成

def __getsamples(self, data, proportion, seed):
    x = np.zeros((样本总数, self.window, self.var_num))
    j = 0
    for i in range(样本总数):    // 样本总数*self.window should <= len(data)
        start = j
        end = j + self.window
        x[i, :, :] = data[start:end, :]
        j = end

HIUZS commented 4 months ago

好的，谢谢。我去试验一下。请问生成数据能否控制他的生成数据的值的有效数字吗

Y-debug-sys commented 4 months ago

对保存的numpy数据data使用代码

import numpy as np
np.around(data, 你要保存的位数)

HIUZS commented 4 months ago

好的，非常感谢

Awenega commented 4 months ago

For example, change lines 62 to 67 of Utils/Data_utils/real_datasets.py

def __getsamples(self, data, proportion, seed):
    x = np.zeros((self.sample_num_total, self.window, self.var_num))
    for i in range(self.sample_num_total):
        start = i
        end = i + self.window
        x[i, :, :] = data[start:end, :]

Change to

def __getsamples(self, data, proportion, seed):
    x = np.zeros((样本总数, self.window, self.var_num))
    for i in range(0, 样本总数*self.window, self.window):
        start = i
        end = i + self.window
        x[i, :, :] = data[start:end, :]

Hi, I have the same problem, but I haven't quite figured out how to solve it. How should I modify "x" ? I notice that the only difference is that instead of “self.sample_num_total”, I should enter “样本总数”. What does the latter term mean? Using a translator on that term, it would seem to be translated as “total number of samples,” which is equal to "self.sample_num_total".

Y-debug-sys commented 4 months ago

Sorry, there are some mistakes in the previous answer. Now it's fixed. Here “样本总数” is the total number of training samples (can be custom value due to memory reasons), its relationship to original "self.sample_num_total" is more like self.sample_num_total≈样本总数*self.window.

Awenega commented 4 months ago

Sorry, there are some mistakes in the previous answer. Now it's fixed. Here “样本总数” is the total number of training samples (can be custom value due to memory reasons), its relationship to original "self.sample_num_total" is more like self.sample_num_total≈样本总数*self.window.

Thank you very much, now is clear