OSGeo / grass

GRASS GIS - free and open-source geospatial processing engine
https://grass.osgeo.org
Other
791 stars 288 forks source link

r.texture: control the number of threads #3917

Open cyliang368 opened 1 week ago

cyliang368 commented 1 week ago

This PR fixes the coverity issue in #3857. The number of threads should be at least > 1.

nilason commented 1 week ago

I believe it should be enough to initialize at the declaration at line 92: int threads = 1;.

parm.nproc->answer is by default "1": https://github.com/OSGeo/grass/blob/3f30f1bb59f79d18074a3527d9101200c0975214/lib/gis/parser_standard_options.c#L757-L770 Something the static analyser apparently didn't catch.

cyliang368 commented 1 week ago

I believe it should be enough to initialize at the declaration at line 92: int threads = 1;.

parm.nproc->answer is by default "1":

https://github.com/OSGeo/grass/blob/3f30f1bb59f79d18074a3527d9101200c0975214/lib/gis/parser_standard_options.c#L757-L770

Something the static analyser apparently didn't catch.

A user can specify nprocs as a 0 or negative value. In this case, the default value will be replaced. Overall, we still need to handle this situation.

nilason commented 1 week ago

A user can specify nprocs as a 0 or negative value. In this case, the default value will be replaced. Overall, we still need to handle this situation.

Indeed, see eg. https://github.com/OSGeo/grass/blob/3f30f1bb59f79d18074a3527d9101200c0975214/raster/r.univar/r.univar_main.c#L181-L182

cyliang368 commented 1 week ago

Indeed, see eg.

https://github.com/OSGeo/grass/blob/3f30f1bb59f79d18074a3527d9101200c0975214/raster/r.univar/r.univar_main.c#L181-L182

Raising an error for nprocs = 0 is a good implementation. For nprocs < 0, in my experience, some libraries or clusters use (nprocs + 1 + max_threads_ondevice) to launch threads or distribute nodes. For example, Python joblib_ uses this approach. It is convenient to use nprocs = -1 if I want the program to run as fast as possible, no matter which machine is used.

I am open to any choices below.

  1. Raise error when nprocs < 1
  2. Raise error when nprocs = 0, and set threads = nprocs + 1 + max_threads_on_device when nprocs < 0.
  3. Set threads = 1 when nprocs = 0, set (nprocs + 1 + max_threads_on_device) when nprocs < 0.

We just need to make sure the behavior is consistent in every parallelized module. I think now is a good chance to discuss this.

nilason commented 1 week ago

I am open to any choices below.

  1. Raise error when nprocs < 1
  2. Raise error when nprocs = 0, and set threads = nprocs + 1 + max_threads_on_device when nprocs < 0.
  3. Set threads = 1 when nprocs = 0, set (nprocs + 1 + max_threads_on_device) when nprocs < 0.

We just need to make sure the behavior is consistent in every parallelized module. I think now is a good chance to discuss this.

I have no strong opinion, but I prefer the simplicity of no. 1.

anikaweinmann commented 1 week ago

I am open to any choices below.

  1. Raise error when nprocs < 1
  2. Raise error when nprocs = 0, and set threads = nprocs + 1 + max_threads_on_device when nprocs < 0.
  3. Set threads = 1 when nprocs = 0, set (nprocs + 1 + max_threads_on_device) when nprocs < 0.

We just need to make sure the behavior is consistent in every parallelized module. I think now is a good chance to discuss this.

I like to use threads = nprocs + 1 + max_threads_on_device when nprocs < 0, so that my code is independent of the environment on which I execute it.

HuidaeCho commented 1 week ago

So nprocs=-1 means max_threads_on_device not max_threads_on_device - 1? I would say counterintuitive. We need to handle this GRASS-wide, not module by module. This is my suggestion:

But again, we need to handle it in lib/gis/parser.c (something like G_option_to_nprocs(); see G_option_to_separator()) , not in this module.

echoix commented 1 week ago

So nprocs=-1 means max_threads_on_device not max_threads_on_device - 1? I would say counterintuitive. We need to handle this GRASS-wide, not module by module. This is my suggestion:

  • nprocs > 0 for min(nprocs, max_threads_on_device)
  • nprocs <= 0 for max(max_threads_on_device + nprocs, 1) (e.g., nprocs=0 for max_threads_on_device, nprocs=-1 for max_threads_on_device - 1, and nprocs=-10000 with 24 threads for 1, etc.)

But again, we need to handle it in lib/gis/parser.c (something like G_option_to_nprocs(); see G_option_to_separator()) , not in this module.

So nproc = 0 would mean something similar to make -j 0 or some other tools in dotnet (C#) world, where 0 is to be the max supported (the most possible), while 1 would limit to 1 (not parallel)?

HuidaeCho commented 1 week ago

So nprocs=-1 means max_threads_on_device not max_threads_on_device - 1? I would say counterintuitive. We need to handle this GRASS-wide, not module by module. This is my suggestion:

  • nprocs > 0 for min(nprocs, max_threads_on_device)
  • nprocs <= 0 for max(max_threads_on_device + nprocs, 1) (e.g., nprocs=0 for max_threads_on_device, nprocs=-1 for max_threads_on_device - 1, and nprocs=-10000 with 24 threads for 1, etc.)

But again, we need to handle it in lib/gis/parser.c (something like G_option_to_nprocs(); see G_option_to_separator()) , not in this module.

So nproc = 0 would mean something similar to make -j 0 or some other tools in dotnet (C#) world, where 0 is to be the max supported (the most possible), while 1 would limit to 1 (not parallel)?

Yes. I don't see a reason for two options for serial (nprocs=0 and nprocs=1 for 1).

cyliang368 commented 1 week ago

So nprocs=-1 means max_threads_on_device not max_threads_on_device - 1? I would say counterintuitive. We need to handle this GRASS-wide, not module by module. This is my suggestion:

  • nprocs > 0 for min(nprocs, max_threads_on_device)
  • nprocs <= 0 for max(max_threads_on_device + nprocs, 1) (e.g., nprocs=0 for max_threads_on_device, nprocs=-1 for max_threads_on_device - 1, and nprocs=-10000 with 24 threads for 1, etc.)

But again, we need to handle it in lib/gis/parser.c (something like G_option_to_nprocs(); see G_option_to_separator()) , not in this module.

Sounds good to me. In my experience, -1 is usually the maximum or end of an array, especially in Python. I think setting 0 for maximum is also reasonable.

I agree that this should be handled in lib/gis/parser.c so that every module works consistently and is easier to maintain.

HuidaeCho commented 1 week ago

So nprocs=-1 means max_threads_on_device not max_threads_on_device - 1? I would say counterintuitive. We need to handle this GRASS-wide, not module by module. This is my suggestion:

  • nprocs > 0 for min(nprocs, max_threads_on_device)
  • nprocs <= 0 for max(max_threads_on_device + nprocs, 1) (e.g., nprocs=0 for max_threads_on_device, nprocs=-1 for max_threads_on_device - 1, and nprocs=-10000 with 24 threads for 1, etc.)

But again, we need to handle it in lib/gis/parser.c (something like G_option_to_nprocs(); see G_option_to_separator()) , not in this module.

Sounds good to me. In my experience, -1 is usually the maximum or end of an array, especially in Python. I think setting 0 for maximum is also reasonable.

Right, if 0 has a meaning (first in 0-based indexing), but the actual number of threads can only be positive.

I agree that this should be handled in lib/gis/parser.c so that every module works consistently and is easier to maintain.