2080 ti config profile - can we use v100_16gb?

valentinvieriu commented 3 years ago

I assume a lot of people out there they still have a 2080 ti. Is the v100_16gb config profile suited for 2080ti too?

RoyWheels commented 3 years ago

It should be fine for anything up to 1280x768 or 1024x1024. A v100 uses 9gb for 1280x768 training and that should also be fine with the 11gb of the 2080ti. The memory use is always higher than the figure given so there's a few gb of overhead.

Auto should also work if the dataset resolution isn't too big. Resolution obviously adds to memory use.

I also added a config called 'atari' (I need to give configs better names!) which works very well for smaller resolutions (256x256 or 384x256). It does have the gamma set to 1 which might not be suitable for all datasets. Add --gamma=10 to override that setting.

If you use a GPU with 16gb of ram, another, to be renamed, config is 'large'. It uses a larger batch size and should train 512x512 (or 640x512) fairly well. It might also work with 11gb.

valentinvieriu commented 3 years ago

I've tried, and still get some warnings regarding memory

stylegan2-ada_1  | Training for 25000 kimg...
stylegan2-ada_1  | 
stylegan2-ada_1  | 2020-10-23 16:27:18.904835: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.16GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
stylegan2-ada_1  | 2020-10-23 16:27:18.904904: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.16GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
stylegan2-ada_1  | 2020-10-23 16:27:19.181792: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.17GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
stylegan2-ada_1  | 2020-10-23 16:27:19.181825: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.17GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
stylegan2-ada_1  | 2020-10-23 16:27:19.254877: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.16GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
stylegan2-ada_1  | 2020-10-23 16:27:19.254908: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.16GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
stylegan2-ada_1  | 2020-10-23 16:27:19.595838: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.27GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
stylegan2-ada_1  | 2020-10-23 16:27:19.595871: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.27GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
stylegan2-ada_1  | 2020-10-23 16:27:27.664602: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.14GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
stylegan2-ada_1  | 2020-10-23 16:27:27.664636: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.14GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
stylegan2-ada_1  | tick 0     kimg 0.0      time 1m 05s       sec/tick 20.8    sec/kimg 1299.31 maintenance 44.2   gpumem 8.9   augment 0.000

But after this it is working, so I guess I can Ignore them?

RoyWheels / stylegan2-ada

2080 ti config profile - can we use v100_16gb? #3