Closed zpwithme closed 1 month ago
Hi @zpwithme, thank you for your interest in this project and for your question. You are correct, there is something a bit confusing with the way we construct batches. I will try to clarify this a bit.
Each batch is constructed from batch_size
clouds/tiles from which sample_graph_k
subgraphs are randomly sampled. You can have a look at the documentation for SampleRadiusSubgraphs
to see how this subgraph sampling is performed. For S3DIS, for instance, each batch is composed of sample_graph_k=4
subgraphs, sampled from batch_size=1
Areas. So, you should normally find 4 different values in norm_index
for S3DIS.
I agree this definition of batch_size
is a bit convoluted, it is connected to the fact that we keep some preprocessing operations on the fly for efficiency (like SampleRadiusSubgraphs
). For now, you can think of batch_size
as the "number of files to read from disk to build the batch", while the actual batch size would rather be batch_size * sample_graph_k
.
Hope that answers your question !
PS: since I am at it, let's make it even more confusing. If you decide to use gradient accumulation with gradient_accumulator
(as used in the provided datamodule configs for training on 11G GPU), your effective batch size will also change accordingly 😇
Thank you very much for your work. In the S3DIS dataset configuration, the batch size is set to 1, but why does norm_index contain both 0 and 1? Does this require using different normalization methods?