Hemken / Statamarkdown

Functions to write Stata documentation with knitr
Other
59 stars 11 forks source link

Error: node stack overflow #30

Closed nfparsons closed 2 years ago

nfparsons commented 2 years ago

I'm attempting to use a .dta file downloaded from a google drive. I am running both stata code and r on it - thus Statamarkdown. However, I'm encountering an "Error: node stack overflow" attempting to run any stata after the download.

Here's my sessionInfo:

R version 4.1.2 (2021-11-01) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Monterey 12.6

Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] forcats_0.5.2 stringr_1.4.1 dplyr_1.0.10 purrr_0.3.4 readr_2.1.2 tidyr_1.2.1 tibble_3.1.8
[8] ggplot2_3.3.6 tidyverse_1.3.2 gtsummary_1.6.1 labelled_2.10.0 knitr_1.40 googledrive_2.0.0 rio_0.5.29
[15] conflicted_1.1.0 here_1.0.1 Statamarkdown_0.7.1 pacman_0.5.1

loaded via a namespace (and not attached): [1] httr_1.4.4 jsonlite_1.8.0 foreach_1.5.2 modelr_0.1.9 assertthat_0.2.1 googlesheets4_1.0.1 cellranger_1.1.0
[8] yaml_2.3.5 pillar_1.8.1 backports_1.4.1 glue_1.6.2 digest_0.6.29 rvest_1.0.3 colorspace_2.0-3
[15] htmltools_0.5.3 pkgconfig_2.0.3 broom_1.0.1 haven_2.5.1 scales_1.2.1 openxlsx_4.2.5 tzdb_0.3.0
[22] generics_0.1.3 ellipsis_0.3.2 cachem_1.0.6 withr_2.5.0 cli_3.4.0 magrittr_2.0.3 crayon_1.5.1
[29] readxl_1.4.1 evaluate_0.16 memoise_2.0.1 fs_1.5.2 fansi_1.0.3 doParallel_1.0.17 broom.helpers_1.8.0 [36] xml2_1.3.3 foreign_0.8-82 tools_4.1.2 data.table_1.14.2 hms_1.1.2 gargle_1.2.1 lifecycle_1.0.2
[43] munsell_0.5.0 reprex_2.0.2 zip_2.2.1 compiler_4.1.2 rlang_1.0.5 grid_4.1.2 gt_0.7.0
[50] iterators_1.0.14 rstudioapi_0.14 rmarkdown_2.16 gtable_0.3.1 codetools_0.2-18 DBI_1.1.3 curl_4.3.2
[57] R6_2.5.1 lubridate_1.8.0 fastmap_1.1.0 utf8_1.2.2 rprojroot_2.0.3 stringi_1.7.8 parallel_4.1.2
[64] Rcpp_1.0.9 vctrs_0.4.1 dbplyr_2.2.1 tidyselect_1.1.2 xfun_0.33

Here's the code that seems to be giving me trouble:

{r}
#| label: r: download data from repo

googledrive::drive_download(
  "https://drive.google.com/file/d/1vsSGvMdsn92-vhYvKM-o2e3Q6nywVCtr/view?usp=sharing", 
  path = "temp_repo/adTurn_survey.dta", 
  overwrite = TRUE
)

googledrive::drive_download(
  "https://drive.google.com/file/d/1LwLAVyogYsDwd_4B8zpQmJ9ONjdgRvO6/view?usp=sharing", 
  path = "temp_repo/adTurn_list.dta", 
  overwrite = TRUE
)
{stata stata: prep data, collectcode=TRUE}
clear all

// load data
use temp_repo/adTurn_survey

// remove duplicate variables 
drop Check1* Q R S T

// merge with Administrator data
merge 1:1 ccmu1 firstname lastname using temp_repo/adTurn_list
save "temp_repo/adTurn_raw_merge", replace

Further: Not running the r download chunk does not result in the stata chunk throwing an 'Error: node stack overflow'. I don't understand enough about node stacks to know what to do here. Is there a way to reset or clear them?

Further further: Ok - running any r chunk prompts the 'Error: node stack overflow' error when subsequently running a stata chunk.

remlapmot commented 2 years ago

My guess is that this is not related to the Statamarkdown package.

Given that the error only happens when you run download chunk - then surely it's a possible problem with the googledrive package (if any package).

I have seen examples on the net when you get this error in RStudio with either massive datasets, i.e., you need more RAM; or maybe you accidentally enter an infinite loop.

Do you know if the dataset is very large?

You seem to be on a Mac - the M1 macs don't have that much RAM - so running out of RAM could be a plausible answer. I'd try closing all other apps and rerunning, or rerunning on a machine with more RAM.

Hemken commented 2 years ago

Your minimal example runs just fine for me (on Windows, I don’t have access to a Mac).

I agree with Tom, the error message suggests a memory issue.

From: nfparsons @.> Sent: Friday, September 23, 2022 2:00 PM To: Hemken/Statamarkdown @.> Cc: Subscribed @.***> Subject: [Hemken/Statamarkdown] Error: node stack overflow (Issue #30)

I am suddenly unable to run any stata code at all. All I get back is 'Error: node stack overflow'. I've reinstalled the package and still nothing. Any advice would be very welcome.

Sys.info() sysname: "Darwin" release: "21.6.0" version: "Darwin Kernel Version 21.6.0: Mon Aug 22 20:19:52 PDT 2022; root:xnu-8020.140.49~2/RELEASE_ARM64_T6000" nodename: "Nates-MacBook-Pro-M1.local" machine: "x86_64"

Rmarkdown code:

knitr::opts_chunk$set(echo = TRUE)

library(Statamarkdown)

cd

Error: node stack overflow

[Image removed by sender. Screen Shot 2022-09-23 at 11 59 48 AM]https://user-images.githubusercontent.com/26283454/192038528-ef842593-4c6d-40d8-84f4-c97acfdc5184.png

— Reply to this email directly, view it on GitHubhttps://github.com/Hemken/Statamarkdown/issues/30, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACYBME6UKUYUBWA5QVAUOR3V7X43TANCNFSM6AAAAAAQUHLBNI. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Hemken commented 2 years ago

If I try to run the code in your example here (below), I obviously can't actually download the files, but the Stata code to merge the files responds as expected, with a Stata error that it can't find the files. Stata responds both before and after loading the package.

So, simply loading the googledrive package is not the issue (on my computer).

{r}
#| label: r: download data from repo

googledrive::drive_download(
  "https://drive.google.com/file/d/1vsSGvMdsn92-vhYvKM-o2e3Q6nywVCtr/view?usp=sharing", 
  path = "temp_repo/adTurn_survey.dta", 
  overwrite = TRUE
)

googledrive::drive_download(
  "https://drive.google.com/file/d/1LwLAVyogYsDwd_4B8zpQmJ9ONjdgRvO6/view?usp=sharing", 
  path = "temp_repo/adTurn_list.dta", 
  overwrite = TRUE
)
{stata stata: prep data, collectcode=TRUE}
clear all

// load data
use temp_repo/adTurn_survey

// remove duplicate variables 
drop Check1* Q R S T

// merge with Administrator data
merge 1:1 ccmu1 firstname lastname using temp_repo/adTurn_list
save "temp_repo/adTurn_raw_merge", replace
nfparsons commented 2 years ago

Thank you so much for all of your thoughts on this, and my apologies for not providing data - its under restriction and I didn't have time to gin up some fake.

I actually seem to have solved the problem by splitting the r chunks into smaller chunks. Not sure how that helped, but everything seems to be working like clockwork now. Amazing package and thank you so so so much for allowing my research team to 'speak' to one another across platforms.

Hemken commented 2 years ago

I wonder if breaking up the googledrive chunks triggered "garbage collection"?

By the way, are you running this as a Quarto document? Just curious.

nfparsons commented 2 years ago

I ran it initially as quarto, but also tried it as a regular .rmd.

— Nathan Parsons, B.SC, M.Sc, G.C.

Ph.D. Candidate, Dept. of Sociology, Portland State University Adjunct Professor, Dept. of Sociology, Washington State University

Recent work (https://www.researchgate.net/profile/Nathan_Parsons3/publications) Schedule an appointment (https://calendly.com/nate-parsons)

On Tuesday, Sep 27, 2022 at 3:35 AM, Doug Hemken @. @.)> wrote:

I wonder if breaking up the googledrive chunks triggered "garbage collection"?

By the way, are you running this as a Quarto document? Just curious.

— Reply to this email directly, view it on GitHub (https://github.com/Hemken/Statamarkdown/issues/30#issuecomment-1259309775), or unsubscribe (https://github.com/notifications/unsubscribe-auth/AGIQ3PT6WQCQENJRIXJPUADWALEVRANCNFSM6AAAAAAQUHLBNI). You are receiving this because you authored the thread.Message ID: @.***>