Closed bryanbocao closed 2 years ago
The default value of
run_count
is10M
and it takes a long time for me to sample. Does it only affect to what extent the stats/distribution of sampled train dataset matches the originaltrain2017
? If I setrun_count
to be10
, should I assume that the stats/distribution would not be affected too much? Thanks!
How much time does it take you to go over the 10M?
Did you happen to understand whether this has any effect? Would like to know that too as for me also, running the sampling script takes a lot of time...
The default value of
run_count
is10M
and it takes a long time for me to sample. Does it only affect to what extent the stats/distribution of sampled train dataset matches the originaltrain2017
? If I setrun_count
to be10
, should I assume that the stats/distribution would not be affected too much? Thanks!
If you set the run_count to 10, highly likely the distributions will not much. Basically, the more you sample, the highest the chances are for getting similar distributions in the sampled set and train set. Maybe we could add support for multi threaded sampling, it will reduce the run time.
@sholevs66 I tested one month ago, as far as I remember, it waited for 10~20min but it still could not finish sampling. My machine hardward setup is
Intel Core i9 - 9900KF - 128GB Memory - NVIDIA GeForce RTX 2080 SUPER
I think sampling runs on CPU.
@giddyyupp Thanks for the reply and I think that's a good suggestion.
@giddyyupp I added the coco_minitrain_25k.zip download link, @sholevs66 you may download the 25k minitrain dataset in this zip file without resampling yourself.
Pull request: https://github.com/giddyyupp/coco-minitrain/pull/16 https://github.com/bryanbocao/coco-minitrain/tree/wip
The default value of
run_count
is10M
and it takes a long time for me to sample. Does it only affect to what extent the stats/distribution of sampled train dataset matches the originaltrain2017
? If I setrun_count
to be10
, should I assume that the stats/distribution would not be affected too much? Thanks!