USEPA / elevatr

An R package for accessing elevation data
Other
203 stars 26 forks source link

elevatr wants to use too many cores for some systems #100

Closed courtiol closed 2 months ago

courtiol commented 2 months ago

Hi, thanks for your awesome package.

When running elevatr::get_elev_point() on a compute server, the function elevatr::get_elev_point() and perhaps others can crash under default settings and produce a rather misleading error message suggesting to restart R with options:

elev_obj <- elevatr::get_elev_point(data.frame(x = long, y = lat), prj = 4326, src = "aws", z = 8)                                           

Error: Cannot create 127 parallel PSOCK nodes. Each node needs one connection, but there are only 123 connections left out of the maximum 128 available on this R installation. To incr
ease this limit in R (>= 4.4.0), use command-line option '--max-connections=N' when launching R. 

This is caused by this: https://github.com/jhollist/elevatr/blob/041d88e5dbbc6c5157e3eef43bb7fb1a4074ee9e/R/get_elev_raster.R#L205

which is perhaps OK on a personal computer, but that seems to be too much to ask in general.

Since the option is not explicitly exposed to the higher level functions used by users, this may cause some confusion.

I understand the idea to automatically speed things up, but many would argue this is a dangerous default (in passing, I find it interesting that CRAN did not jump through the roof since I think they have a policy of 2 cores max under defaults; but perhaps that only applies to examples).

Perhaps bringing the ncpu argument up to the top-user level functions would already help a lot. Or, perhaps the default should be to be slow and to message people that speedup is possible by setting ncpu to higher value.

As a user, I would prefer the former, as a sysadmin I would prefer the latter for avoiding users to potentially use the entire cluster for themselves (although systems are often in place to avoid that to happen).

Many thanks for thinking about it (I can live with the current implementation). Alex

jhollist commented 2 months ago

@courtiol thanks for reporting this. My limited experience working on shared clusters is obviously showing!

I think a good solution is to do both. Set the default to 2 and expose the ncpu argument so that users can make that call themselves. I might be able to get time to add this in this week. Not sure when it will eventually get pushed to CRAN though!

Stay tuned.

jhollist commented 2 months ago

@courtiol ncpu is now available in the main branch for users in the get_elev_point and get_elev_raster. Default is 2 (if more than 2 cores available) and you can set it yourself as well. Let me know if this works for you and prevents elevatr from hogging all the cpus on your cluster!

courtiol commented 2 months ago

Will test ASAP and let you know!

courtiol commented 2 months ago

I tested and it works. Thanks a lot.