EngineerDanny / Rperform

:bar_chart: R package for tracking performance metrics across git versions and branches.
https://analyticalmonk.github.io/Rperform
GNU General Public License v3.0
1 stars 0 forks source link

Rperform R CMD Check #1

Closed EngineerDanny closed 2 years ago

EngineerDanny commented 2 years ago

This PR creates a Github Action for setting up R, Rperform and its dependencies on the remote machine. The test is carried out on the stringr package.

The minimal test is to run Rperform on this package successfully.

EngineerDanny commented 2 years ago

Implementation Review

The main focus is to be able to run Rperform on other packages(eg. stringr ). On this repository, there will be a minimal testing of the package itself. The Github Action tests will be done on the repositories of other packages which I will fork. For conveniences, I will proceed the testing with the stringr package here.

EngineerDanny commented 2 years ago

The Rperform package has been failing R CMD checks.

Screenshot 2022-06-22 at 11 44 48 AM

As it can be seen, all the tests passed except running r-lib/actions/check-r-package@v2 After running the package check locally, even though the test didn't explicitly give me an error, it failed.

The only warning was :

checking Rd cross-references ... WARNING Missing link or links in documentation object 'Rperform-package.Rd': ‘[git2r:git2r-package]{git2r}’

See section 'Cross-references' in the 'Writing R Extensions' manual.

I fixed the warning by inserting the git2r official github link.

EngineerDanny commented 2 years ago

Initially, the issue was invalid link in the test-repo-metrics.Rfile. I found out that the hadley/stringr link does not exist. Even though, when you open the link you will be redirected to the actual repository which is the tidyverse/stringr .

My test results were inconsistent (Failing more times and occassionally passing) due to this issue. Therefore, I changed the github link to tidyverse/stringr

EngineerDanny commented 2 years ago

Although the issue with the link was fixed and R CMD Check on my local machine passed, the GitHub CI action still failed. So I started to dig into the tidyverse/stringr package tests.

Inside the https://github.com/tidyverse/stringr/blob/main/tests/testthat/test-dup.r file, there are three tests which are ran internally.

Screenshot 2022-06-22 at 2 30 21 PM

As shown above, all the other tests passed except the uses tidyverse recycling rules test. This led to my conclusion that, this may be an internal issue with the tidyverse/stringr package.

The first two tests however, could be used in checking Rperform.

EngineerDanny commented 2 years ago

After several debugging and testing, I finally found the final error that was causing the Rperform R CMD Check test to fail on GitHub CI. It turns out that during my adventures and explorations, I mistakenly created a file on the master branch
👉 [tests/testthat/test-plot-metrics.R].

Screenshot 2022-06-22 at 8 14 21 PM

As recorded from the logs above, in the file, there is an issue with the str_c function because it does not exist. Therefore, I have removed this file since it's not needed.

Now, all checks have passed ✅.

tdhock commented 2 years ago

also i'm not sure if this is the case for the stringr that you are using, but it would be good to have a test case in which there is a clear slowdown/speedup that can be observed, for example this branch/pr probly has a slow down https://github.com/tdhock/binsegRcpp/pull/10

EngineerDanny commented 2 years ago

My new commits to this repo fix most of the issues raised.

  1. Indentation inconsistency in the R files have been fixed by using R studio's built-in indentation. (1)
  2. Various binary files and directories have been added to the .gitignore file. (2)
  3. Named reference approach has been adopted to reference list members by names instead of numeric indexes. (3)
EngineerDanny commented 2 years ago

@tdhock My recent tests on PR/branch with Rperform show that your assumption is right. Merging this PR made the code consistently slower.

Since they were on different branches, I ran the plot_branchmetrics(test_path = "tests/testthat/test-CRAN.R", metric = "time", branch1 = "l1loss", branch2 = "master", save_data = T, save_plots = F) function and observed the results.

I noticed that, the result was not depicted clearly on the graph. #4 However, the time_frame data set that was saved had 254 observations and 8 variables. This is the data 👉 time_frame.csv

EngineerDanny commented 2 years ago

BREAKING DOWN THE RESULTS The results were consistent. After the PR was merged, there was a clear slowdown that was observed.

1. The first test was "one data point has zero loss"

Screenshot 2022-06-23 at 4 34 30 PM

Average Time Taken Before PR : 0.009315 seconds. After merging PR : 0.0180096 seconds (Approximately 2 times increase)

2. "equal split cost is ok" test

Screenshot 2022-06-23 at 4 35 27 PM

Average Time Taken Before PR : 0.0351191 seconds. After merging PR : 0.057926 seconds (Approximately 1.6 times increase)

3. "error for 0 data" test

Screenshot 2022-06-23 at 4 36 09 PM

Average Time Taken Before PR : 0.0072106 seconds. After merging PR : 0.0352905 seconds (Approximately 5 times increase)

The list goes on and on, showing a consistent increase in time after merging the PR.