how to split big dataset into training/valid/test se?

Just4Uzi commented 6 years ago

请问在large数据集上是怎么划分train/test/valid的呢

brightmart commented 6 years ago

Hi,

there is a valid/test set in small dataset. i just replace training data from large dataset. valid/test is same as small dataset.

bright

发件人: SlashChven notifications@github.com 发送时间: 2018年6月19日 16:58 收件人: brightmart/ai_law 抄送: Subscribed 主题: [brightmart/ai_law] 关于数据集划分问题 (#1)

请问在large数据集上是怎么划分train/test/valid的呢

― You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbrightmart%2Fai_law%2Fissues%2F1&data=02%7C01%7C%7Cf44547c24de846d98af108d5d5c2defc%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636649955300594034&sdata=WcZfAM6u9Bs4%2BTYCqGpE1f%2Bf1eSoFSq0wAm0%2B0q6weo%3D&reserved=0, or mute the threadhttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FASuYMHEC3nb4RFHj-qc60kj1RIhgOgaYks5t-L1HgaJpZM4UtG0k&data=02%7C01%7C%7Cf44547c24de846d98af108d5d5c2defc%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636649955300604045&sdata=ibCQdjfUbMf0zKoLqRU9%2BabOL6UOsavb4oTElspScYQ%3D&reserved=0.

Just4Uzi commented 6 years ago

Hi,

I found the large dataset contain small valid/test dataset. Is this data division method not suitable?

Slash

brightmart commented 6 years ago

thanks for your informaiton. i think that's why the local performance is a little lower than offlne performance.

you can just split train/valid/test from large dataset. for example given 10k to valid, 10k to test, the rest for training.

but in fact, even my setting, it is still works.

发件人: SlashChven notifications@github.com 发送时间: 2018年6月19日 19:27 收件人: brightmart/ai_law 抄送: brightmart; Comment 主题: Re: [brightmart/ai_law] 关于数据集划分问题 (#1)

Hi,

I found the large dataset contain small valid/test dataset. Is this data division method not suitable?

Slash

― You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbrightmart%2Fai_law%2Fissues%2F1%23issuecomment-398366201&data=02%7C01%7C%7Cb23dd44ba55d456fe5d208d5d5d7aa64%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636650044618175176&sdata=qZ2aYmk%2BLuUx05U4YGs4fKfLddfJzduxLg9oT4u0WuU%3D&reserved=0, or mute the threadhttps://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FASuYMLounxzqOcd1GZkdiPV4V8ymkfQ7ks5t-OApgaJpZM4UtG0k&data=02%7C01%7C%7Cb23dd44ba55d456fe5d208d5d5d7aa64%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636650044618175176&sdata=Q%2BC8OEphqdkpy1hS1q4UpeR58PZIu246AQjqsKMcfec%3D&reserved=0.

brightmart / ai_law

how to split big dataset into training/valid/test se? #1