UBC-MDS / speed_dating_analysis

This repo is for a group project for analyzing speed dating data set for MDS-522
Other
0 stars 0 forks source link

Need to install packages in `renv` in docker image #59

Closed monazhu closed 10 months ago

monazhu commented 10 months ago

I was trying to run everything on docker to make sure our commands would run properly. Rstudio Server times out when I tried to run renv::restore() on it, so I think we would need to run renv::restore() directly in the dockerfile and built those packages into the image for reproducibility. I'm trying to see if I can figure this out in the next hour or so...

monazhu commented 10 months ago

Sorry guys - you probably got bombarded with a bunch of messages about failed tests - I tried to update the dockerfile but I can't seem to properly be able to install the renv in the docker image.

As a quick summary of what I tried to do and where things are at:

  1. the scripts in the bash file can be run on a local terminal no problem
  2. when I tried running the bash file through the containerized Rstudio Server, I can't seem to run the script at all, and updating the renv via renv::snapshot() produces an error (in particular, this has to do with some packages that we don't seem to use at all in our current project, e.g., the airport package)
  3. when I removed those packages and tried to build the dockerfile on disc, it seems to be building fine, but this fails when I pushed those changes to the repo (the many error messages you saw)
  4. I restored everything to when GH actions completed without error so that the changes I made to the README can be incorporated into the main branch at the very least

At this point I'm kind of at my wit's end and I'm not sure how best to remedy the situation...

mishelly-h commented 10 months ago

I will quickly submit all my other stuff and will then work on this. Thank you for the details. I will try to figure it out!

rorywhite200 commented 10 months ago

Hi @monazhu thank you for looking at this, I am having a look now but maybe need @wenyunie and @mishelly-h's help. So it seems like the error in the actions is being reported as syntax error in the renv.lock file. I guess we also need to add the new packages to the renv.lock file too like knittr and book down too? I know @wenyunie was working on this.

wenyunie commented 10 months ago

Hey guys I'm on this now @monazhu. Mona, just to confirm, there are two separate ways to run our analysis: 1) open the folder in container, and directly run analysis or render report without using .Rproj. 2) open the .Rproj in plain host environment, and renv:restore() to run analysis or render report.

Trying to renv:restore() inside the container might exceeds the limits of memory or disk usage, I guess that could be the reason. When we already used Docker container to satisfy the dependency, it is kinda redundant and bug-introducing to use renv::restore() together with it right?

I will make sure Path 1) and Path 2) both work, and specify in the Readme not to combine the two.

Does that sould OK? @monazhu

wenyunie commented 10 months ago

@monazhu Thanks for mentioning this https://rstudio.github.io/renv/articles/docker.html. My understanding is, this introduces a way to use the dependency information recorded in the renv file when we are trying to use the alternative way of Dockerization for dependency control. This happens during the Docker Image building Stage, and with this done we run the container, the container should be self-contained and we do not need to depend on the renv file inside the container during the Container Running Stage.

Running the renv::restore() in the .Rproj file inside the container will create a new dependency environment inside the container environment, which might be unnecessary complication.

wenyunie commented 10 months ago

@monazhu Thank you Mona for looking into this so late last night. I am sorry I should have mentioned earlier I planned to have this dependency work done on Saturday (I figure I should also update the step-by-step instruction for recreating the environments in the Readme.md, that's why I thought, OK, let's do it on Saturday~)

wenyunie commented 10 months ago

I will have a quick brunch first, and begin to work on this.

monazhu commented 10 months ago

Thanks for the explanation Wenyu! I was updating the README and just wanted to make sure I could run the bash script on the container, but was unable to do so, so I kind of got myself in a bit of a rabbit hole trying to figure it out 😂

On Sat, Dec 2, 2023, 11:21 a.m. wenyunie @.***> wrote:

I will have a quick brunch first, and begin to work on this.

— Reply to this email directly, view it on GitHub https://github.com/wenyunie/speed_dating_analysis/issues/59#issuecomment-1837233658, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2DL3Z4LK6VPAYKHS2ITI3YHN5SXAVCNFSM6AAAAABADY2666VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZXGIZTGNRVHA . You are receiving this because you were mentioned.Message ID: @.***>

wenyunie commented 10 months ago

@monazhu Indeed, you are right. It is impossible write clear instructions in README.md without sorting out how different paths of dependency control work and how they should be separated from each other. I will take the downstream task of adding dependency instruction and step-by-step reproductivity instruction.

I see the bash file is fucking with us at this moment. This is probably because when there is a .Rproj in the folder, the bashfile/commandline in the terminal will try to run rscript with the environment of the .Rproj instead of the plain environment. This is more troublesome in our container where we already have everything ready in the plain (container) environment, but not the .Rproj environment. I will see if this theory is correct.

But anyway, I will take over from here. Please do not let this bother you anymore for this lovely Saturday and enjoy San Franscisco~

monazhu commented 10 months ago

Thank you so much Wenyu!! 🙏💖

On Sat, Dec 2, 2023, 11:46 a.m. wenyunie @.***> wrote:

@monazhu https://github.com/monazhu Indeed, you are right. It is impossible write clear instructions in README.md without sorting out how different paths of dependency control work and how they should be separated from each other. I will take the downstream task of adding dependency instruction and step-by-step reproductivity instruction.

I see the bash file is fucking with us at this moment. This is probably because when there is a .Rproj in the folder, the bashfile/commandline in the terminal will try to run rscript with the environment of the .Rproj instead of the plain environment. This is more troublesome in our container where we already have everything ready in the plain (container) environment, but not the .Rproj environment. I will see if this theory is correct.

But anyway, I will take over from here. Please do not let this bother you anymore for this lovely Saturday and enjoy San Franscisco~

— Reply to this email directly, view it on GitHub https://github.com/wenyunie/speed_dating_analysis/issues/59#issuecomment-1837238298, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2DL33G2BCCBT6KKHG3QYTYHOAQ3AVCNFSM6AAAAABADY2666VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZXGIZTQMRZHA . You are receiving this because you were mentioned.Message ID: @.***>

wenyunie commented 10 months ago

@monazhu Mona, do you run into any issues if you run the project with the updated main? (Both methods for reproduction)