RMI-PACTA / pacta.data.preparation

The goal of {pacta.data.preparation} is to prepare and format all input datasets required to run the PACTA for investors tools.
https://rmi-pacta.github.io/pacta.data.preparation/
Other
1 stars 0 forks source link

Explore multi-threading `{asset}_abcd_scenario.rds` generating function #2

Closed jdhoffa closed 4 months ago

jdhoffa commented 2 years ago

With RMI-PACTA/archive.pacta.data.preparation#81 closed, we open an opportunity to use multi-threading to speed up the time- and memory- intensive processes.

In particular, we might be able to spread each process to calculate {asset}_abcd_scenario_{scenario_name}.rds across multiple CPUs.

cjyetman commented 2 years ago

be aware, this may increase the memory pressure if multiple threads are using up tons of memory

AlexAxthelm commented 1 year ago

All of this is something that we'll have time to explore (read: offer multiple solutions) Soon™️

cjyetman commented 1 year ago

mo' memory, mo' memory

AlexAxthelm commented 1 year ago

cough externalize our data cough

cjyetman commented 4 months ago

Considering how much memory is needed/would be needed for each of these hypothetical threads, I think this would not be an advantageous thing to do. Can I close this @jdhoffa?

jdhoffa commented 4 months ago

Sounds good!

AlexAxthelm commented 4 months ago

Given that this thread is recently dead, I'd like to revive it just a bit (as a long-term improvement). I think that rather than focusing on multithreading the application code, this and dataprep_connect_abcd_with_scenario (see also #7) would be amenable to parallelizing across multiple runners rather than multiple threads on the same machine.

If we split the scenarios out in such a manner that they could be handled my multiple machines (each with access to their own resources, eg. RAM) we could keep them single threaded (like R likes), but lift the memory constraint. Probably some block-level rearranging involved, but I think that's less scary than trying to introduce any of the openmp- or futures-based paradigms into our stack.

jdhoffa commented 4 months ago

I think perhaps that should live in it's own issue?

cjyetman commented 4 months ago

This seems a bit like building a giant umbrella to cover a boat that has a hole in it. We know what the real problem is (dataprep_connect_abcd_with_scenario()). I think our efforts would be better focussed on that rather than distributing the work it does across multiple threads/computers/whatever.