Parallel lapply might be interesting here

bcgov / bcdata

An R package for searching & retrieving data from the B.C. Data Catalogue

https://bcgov.github.io/bcdata

Apache License 2.0

81 stars 12 forks source link

Parallel lapply might be interesting here #251

Closed meztez closed 3 years ago

meztez commented 3 years ago

https://github.com/bcgov/bcdata/blob/b06b2c54adf7afd751947442a6cf5694655c3b25/R/utils.R#L74

This lapply could use more than one core. It would make parsing faster on larger datasets.

boshek commented 3 years ago

I think this is a good idea but I am uncertain about exactly the best way to implement. @ateucher implemented something similar in the bcmaps package here:

https://github.com/bcgov/bcmaps/blob/b77ce28a7ccd1c3f9e800f1098c411649c24663a/R/raster_by_poly.R#L34-L39

Perhaps that could work.

meztez commented 3 years ago

Yeah, that's basically it. Want a PR?

boshek commented 3 years ago

That would be great but I think input from @ateucher would be good first.

ateucher commented 3 years ago

Thanks @meztez - I like the idea, but I'm not totally sure this justifies the extra (relatively heavy) dependency on future. I'm not sure reading in the data is the bottleneck - read_sf is generally pretty fast. I assume the download time is much more significant, but would be happy to be proven wrong.

I actually started looking at parallel and/or async downloads here, which included the parallelization of the reading. I didn't actually implement it though as I thought it might be too hard-hitting on the server.

meztez commented 3 years ago

It can definitely kill your server. You are right about that. Getting the bec map with a 100 records per chunk limit, I'd say processing was about 40%.

It is a good thing to question adding dependencies. Godspeed.