PIP-Technical-Team / wbpip

Official methodological Stats of PIP
https://pip-technical-team.github.io/wbpip/
Other
3 stars 1 forks source link

Refactor consistency #236

Closed zander-prinsloo closed 7 months ago

zander-prinsloo commented 8 months ago

Specific points:

  1. export functions
  2. clean and reformat some code
  3. gd_lq_key_values - new function, used in places such as gd_estimate_dist_stats
  4. check small argument modifications on gd_compute_pov_*_lb functions
  5. review gd_compute_poverty_stats_lq_replacement.R and all associated functions
  6. Review md_compute_poverty_stats_replacement.R and all associated functions
  7. Many changes in gd_utils.R
  8. All new functions are tested
  9. A vignette added
randrescastaneda commented 8 months ago

Task linked: CU-86869wcrp {wbpip} maintenance

randrescastaneda commented 8 months ago

Task linked: CU-8686teb15 refactor for consistency

randrescastaneda commented 8 months ago

Hi @tonyfujs,

Even though this PR includes many changes, none affect the regular behavior of wbpip. As you can see below, all tests, including several new ones, are passing. The main purposes of these changes are:

  1. Expose several functions that we need in the pipster package
  2. improve the behavior of md_compute_poverty_stats() and gd_compute_poverty_stats_lq() to allow vectorization (not fully completed yet), and allow flexibility for additional functions. Also, we created several subfunctions to be used inside these two and, more importantly, to be used independently from them. That is, the user can call the new subfunctions (e.g., md_compute_headcount') without having to callmd_compute_poverty_stats. Since these new subfunctions are exposed, some checks are needed and some efficiency is lost. I tested the performance of the new and oldmdandgd` functions and the difference is negligible. For md I used 20 Million obs and you can see that though the new function is a little slower, the difference in milliseconds is irrelevant.
  3. we added more examples.
  4. Small changes (see initial comment by Zander in this PR)
  5. we vectorized some functions. In the next PR, we will vectorize more functions.

Please, let me know if you have any question.

test pass

image

performance comparison

md_compute_poverty_stats() 20M obs

image

gd_compute_poverty_stats_lq()

image

randrescastaneda commented 8 months ago

Hello - I have not done a full review, but made some comments. Please take a look, and let me know what you think. Thanks!

Some comments are about things that happen across the PR (I haven't highlighted every line of code were this happens), for instance the use of cli in low-level functions and referencing package with package_name::function_name

Hi @tonyfujs, regarding cli I think there is no cost including it and the gains in clarity are great. yet, we can remove them. I understand what you mean about low-level functions, but I think we should meet a middle ground, especially for conditions. I can compromise removing the cli calls, but I think the conditions in some of the functions should remain.

Regarding referencing package, I'll make sure reference properly, but be aware some functions from cli and all the functions from collapse have been imported, so there is no need to reference them. This is an important point. the collapse package brings a lot of benefits that can be gradually implemented in wbpip to speed up processes, that is why I imported the whole thing.

randrescastaneda commented 8 months ago

Hi @tonyfujs,

You can check the PR again. I removed all the informative cli messages that we introduced this round, but I left the ones that were already present in the package. All the tests are passing. Please, feel free to push back as much as needed.

Thanks.

image

randrescastaneda commented 7 months ago

Hi @tonyfujs ,

this PR is ready. All checks pass

image