StirlingCodingClub / studyGroup

Gather together a group to skill-share, co-work, and create community
http://StirlingCodingClub.github.io/studyGroup/
Other
2 stars 1 forks source link

Initialise a GitHub respository in Rstudio #7

Open bradduthie opened 6 years ago

bradduthie commented 6 years ago

A good question raised by @mattnuttall00 is how to initialise a new GitHub repository for an existing folder on your local computer. @jejoenje know's how to do this, can you help?

bradduthie commented 6 years ago

An example of inserted code.

test <- runif();
test;

Check.

jejoenje commented 6 years ago

I lied! This is not quite as straightforward as I thought it was. Adding local version control - 'git' - to an existing RStudio project folder is quite easy, but linking that to a GitHub repository isn't built in I don't think... I will construct some examples.

jejoenje commented 6 years ago

Ha, well, no need to reinvent the wheel. If you've set up git to work in RStudio as, per here, and you have an existing project you want to link to a new GitHub repository, you can follow the steps here, under "Push an existing RStudio project to GitHub". Essentially, the steps are (1) In the Rstudio project, make sure you enable 'git' version control (Tools > Version Control > Project Setup); (2) go to your GitHub account and make a new repository with the same name as your RStudio project; (3) copy the two lines of code that are provided under Push an existing repository from the command line; (4) open a new terminal (command line) in RStudio (Tools > Shell), and paste the two commands. This should set the local repository up to 'track' a remote repository in GitHub. Now, when you commit and push changes using the buttons in the Git tab in RStudio, the changes should get pushed to the remote repository on GitHub.

jejoenje commented 6 years ago

If you want to start and entirely new RStudio project, and link this to a GitHub repository, it's even easier. (1) Start a new repository on GitHub; (2) Copy the link to the GitHub repository; (3) In Rstudio, create a new project, choose 'Version Control', 'Git', paste the repository link, and 'create project'. You should now have a new project with Git version control tracked in the new repository.

mattnuttall00 commented 6 years ago

Thanks @jejoenje ! So does that mean that you do all of your editing in RStudio, and the Github repo is essentially just a storage space?

jejoenje commented 6 years ago

@mattnuttall00 Yes, in a way, although you can still edit docs on GitHub (in browser) as well, like @bradduthie was talking about earlier. You would need to "pull" changes to your local repository to see changes "online" though.

rosemckeon commented 6 years ago

@mattnuttall00 as well as being basically a backup storage space, having a remote repository setup, instead of just tracking changes locally, enables you to collaborate too. You need a remote repo in order to merge changes made by different team members into one cohesive set of files.

It's also really handy for visualising the changes you've made. You can easily look at the repository on github and click through the commit history to see your source code at different stages.You can easily link to your files as well, even to specific lines, so it makes sharing snippets or asking advice on issues easy too.

mattnuttall00 commented 6 years ago

Many thanks everyone! I'll have a crack at getting it all set up

bradduthie commented 6 years ago

@mattnuttall00 @jejoenje @rozeykex If anyone would benefit from a quick guide to pushing and pulling from the command line, I'd be happy to write some quick steps below -- unfortunately, I don't think that this would apply to Windows users (only Mac and Linux).

Another GUI-based way of visualising commit histories that links fairly seamlessly with GitHub is GitKraken, which is free (for open use). I still use the command line out of sheer force of habit, but the setup is actually quite useful. I suspect that if you signed into GitHub from within GitKraken, you would be able to sync things locally pretty easily, and push and pull with a couple easy to identify buttons.

Actually, having opened GitKraken on my desktop for the first time in ages and signed in with GitHub, it has somehow not only recognised all of my new GitHub repositories, but actually figured out where they are located on my desktop. If anyone is keen, I'll start a new issue pointing GitKraken to everyone as a convenient way to link their GitHub with their local machine.

rosemckeon commented 6 years ago

@bradduthie GitKraken sounds useful! For Windows users, when they install Git, they will also get git-bash, so they'll have a unix style terminal that will run all the same base commands.

bradduthie commented 6 years ago

@rozeykex Awesome! Maybe a session on the logic of git, the basic bash commands (init, add, commit, push, pull), but then how something like GitKraken or Rstudio can just do all of it for you in with point and click tools would be useful.

mattnuttall00 commented 6 years ago

Hi all, quick further question regarding all of the above. Because I work from home a lot, I have all of my work on Box, and use Box Sync on my university computer and my personal laptop so that I can be working on either machine and it all syncs between machines. So if I have an RStudio project in my Box files, with git version control, and linked to a GitHub repo, presumably I will need Git installed on both machines? Does anyone foresee any issues with this set up? Thanks :)

jejoenje commented 6 years ago

Yes, you do need it installed on both machines and you would push changes you make on either machine to the remote repo. I actually have a very similar set up but using Resilio Sync (was BitTorrent Sync) between my work machine, my home server and my laptop. Only thing you have to be careful of is that the Sync definitely works when you switch between machines and start pushing/pulling to the remote. If the machine you're working on isn't up to date and you try to push to the remote repo (which was already updated to a later version from the other machine), you can get conflicts. But nothing like that has happened with my stuff for a while.

mattnuttall00 commented 6 years ago

Great thanks @jejoenje . Yes sometimes it takes a wee while for either machine to catch up with the syncing if I've been moving between them quickly. So that's a very good bit of advice - thanks

mattnuttall00 commented 6 years ago

Sorry me again.... I am running into issues trying to link a RStudio project to a Github repo. I have followed the instructions from various tutorials, but I think the fact that I am on a University machine may be causing an issue. In my RStudio project, I have Git version control enabled (and have managed to commit initial files to the local repo). I have created an SSH key in RStudio and added it to my Github account (the key appears to be: mnn1@STRF46D042C37A2) I have then created a new Github repo, and copied the two command lines: git remote add origin https://github.com/mattnuttall00/PhD_Objective1.git git push -u origin master

But I am getting the following errors: $ git push -u origin master git@github.com: Permission denied (publickey). fatal: Could not read from remote repository.

Please make sure you have the correct access rights and the repository exists.

Any ideas?

jmcvw commented 6 years ago

@jejoenje, @mattnuttall00 Maybe a discussion on Git workflows could be interesting.

E.g. How do people work on multiple machines? I used to save my code on box too, but not since I started using Git

How do people mange their data across multiple machines? This is something I seem to wrangle with constantly. Currently I just source scripts on each machine that create the data when need. This works ok, but I think it's definitely wrong!

I'd also be interested to hear about other ideas and issues that people have

mattnuttall00 commented 6 years ago

@jmcvw I think that's a great idea. I'd love to hear about how different people manage their workflows and data management between machines etc.

bradduthie commented 6 years ago

@jmcvw The easiest (though somewhat expensive -- ca 10-12 GBP per month) way that this might be done is through a paid a Dropbox account. Since this comes with over a terabyte of storage (apparently 3 TB now?), it's more than most of us could ever need, so projects could simply be saved on Dropbox and synced automatically. This would remove the need to think about managing data on multiple machines (or fear of data loss). Of course, for huge data sets, this might not be practical if you're switching between computers constantly. Definitely would be curious to hear how other people manage this though!

@mattnuttall00 Your command line code looks right, I think. I'll fiddle around with using git in R studio, but in the mean time, can you run the command below inside your repository:

git remote -v

Does it show the origin at the correct address?

mattnuttall00 commented 6 years ago

@bradduthie you mean in the command line of RStudio, or the Github repository? I'm not sure how to access a command line in a Github repo...

If I run that command in the RStudio shell I get:

origin git@github.com:mattnuttall00/PhD_Objective1.git (fetch) origin git@github.com:mattnuttall00/PhD_Objective1.git (push)

That appears to be the correct address

bradduthie commented 6 years ago

@mattnuttall00 I'll look at how this works in Rstudio. On the command line, this is what gets returned for me when I view the remote repository:

brad@duthie-pc:~/Dropbox/projects/gmse$ git remote -v
origin  https://github.com/bradduthie/gmse.git (fetch)
origin  https://github.com/bradduthie/gmse.git (push)

If you can get it in that format instead, maybe it will ask you for your username and password?

rosemckeon commented 6 years ago

I don't really see the point in syncing git repos with cloud tools like dropbox/drive/box or whatever. It's like buttering your bread twice. I use those for general files and photos, but anything code/data related I just sync with git. Push to a remote, clone and pull on any computer you need it on. Push back. It's less automatic but once a habit has formed it's second nature and there's no risk of loosing your work/data. You also don't have the worry that any automated syncing hasn't completed adn no way it can end up giving you conflicts when you edit partially synced files or loose connection to the cloud service.

bradduthie commented 6 years ago

@rozeykex Yeah, I agree. I think you're syncing things the best way possible for git repositories (I'm just too lazy to separate git repositories from other folders, so everything syncs automatically and I just try to make a habit of waiting patiently when necessary).

jejoenje commented 6 years ago

@jmcvw

@jejoenje, @mattnuttall00 Maybe a discussion on Git workflows could be interesting.

E.g. How do people work on multiple machines? I used to save my code on box too, but not since I started using Git

How do people mange their data across multiple machines? This is something I seem to wrangle with constantly. Currently I just source scripts on each machine that create the data when need. This works ok, but I think it's definitely wrong!

  • I used to use the Uni H drive, which can be accessed from home, but it is too small and was often slow for using spatial data.
  • Maybe I should go back to using box for data but for spatial stuff I think it can be slow
  • Or a syncing app like Resilio Sync, but I kind of have "sign up fatigue"

I'd also be interested to hear about other ideas and issues that people have

Big fat yes - very good idea, particularly given @rozeykex and @bradduthie 's views on this in the above.

Suggest it here?

jejoenje commented 6 years ago

@mattnuttall00

Sorry me again.... I am running into issues trying to link a RStudio project to a Github repo. I have followed the instructions from various tutorials, but I think the fact that I am on a University machine may be causing an issue. In my RStudio project, I have Git version control enabled (and have managed to commit initial files to the local repo). I have created an SSH key in RStudio and added it to my Github account (the key appears to be: mnn1@STRF46D042C37A2) I have then created a new Github repo, and copied the two command lines: git remote add origin https://github.com/mattnuttall00/PhD_Objective1.git git push -u origin master

But I am getting the following errors: $ git push -u origin master git@github.com: Permission denied (publickey). fatal: Could not read from remote repository.

Please make sure you have the correct access rights and the repository exists.

Any ideas?

I'm afraid I'm not sure, @mattnuttall00 ... I think it's unlikely its because you're on a uni machine, as it's worked for me fine, presumably with the same constraints... May have to see your setup to help...

Could you try using --verbose to the git push command to get a bit more info? I'll be setting up RStudio/Git/Github in a new Linux install shortly, so I'll let you know how that goes.

jejoenje commented 6 years ago

@rozeykex @bradduthie

I don't really see the point in syncing git repos with cloud tools like dropbox/drive/box or whatever. It's like buttering your bread twice. I use those for general files and photos, but anything code/data related I just sync with git. Push to a remote, clone and pull on any computer you need it on. Push back. It's less automatic but once a habit has formed it's second nature and there's no risk of loosing your work/data. You also don't have the worry that any automated syncing hasn't completed adn no way it can end up giving you conflicts when you edit partially synced files or loose connection to the cloud service.

Well, I do rather like extra butter with everything... :) Anyway, I see your point - in part. I think my personal view is just that while Git/Github are pretty specific tools for version control / collaborative work, proper sync/backup solutions are needed on top of that... and I prefer to backup/sync EVERYthing, including folders that I also use version control for...

rosemckeon commented 6 years ago

@jejoenje butter is pretty darn good, I'll give you that!

@mattnuttall00 I've used git on the uni machines with RStudio, so it definitely should work. That key looks way too short. Did you copy the public rsa key from ~/.ssh/ ?

jejoenje commented 6 years ago

Sorry me again.... I am running into issues trying to link a RStudio project to a Github repo. I have followed the instructions from various tutorials, but I think the fact that I am on a University machine may be causing an issue. In my RStudio project, I have Git version control enabled (and have managed to commit initial files to the local repo). I have created an SSH key in RStudio and added it to my Github account (the key appears to be: mnn1@STRF46D042C37A2) I have then created a new Github repo, and copied the two command lines: git remote add origin https://github.com/mattnuttall00/PhD_Objective1.git git push -u origin master

But I am getting the following errors: $ git push -u origin master git@github.com: Permission denied (publickey). fatal: Could not read from remote repository.

Please make sure you have the correct access rights and the repository exists.

Any ideas?

@mattnuttall00 I've just had another look at this while moving my entire laptop (MacBook Pro) to run Ubuntu as the main operating system, so I'm needing to reinstall everyting including Git. Because I'm starting afresh, I decided to make sure Git and SSH are working by themselves first (i.e. ignore Rstudio). I followed this: Checking for existing SSH keys Generating a new SSH key and adding it to the ssh-agent Adding a new SSH key to your GitHub account Switching remote URLs from HTTPS to SSH This worked for me, as in the test repo can now be pushed/pulled without password authentication. I'll next check and see if I can also get this to work in RStudio, but it might be worth trying to replicate the above to check if your issue is with the Git/SSH set up or with RStudio. Looking at the error message you posted though, I agree with @RosieMangan that it looks like your issue might be the SSH key - it looks hella short...

jejoenje commented 6 years ago

Well, this is interesting. After I changed the repo to use SSH instead of HTTPS (as per here), Rstudio now throws an error when trying to "push" to the remote:

git push origin refs/heads/master ssh_askpass: exec(/usr/bin/ssh-askpass): No such file or directory git@github.com: Permission denied (publickey). fatal: Could not read from remote repository.

Please make sure you have the correct access rights and the repository exists. However, when manually pushing in a terminal, all works fine: git push origin master

Counting objects: 3, done. Delta compression using up to 2 threads. Compressing objects: 100% (3/3), done. Writing objects: 100% (3/3), 323 bytes | 323.00 KiB/s, done. Total 3 (delta 1), reused 0 (delta 0) remote: Resolving deltas: 100% (1/1), completed with 1 local object. To github.com:jejoenje/BFRMPROC.git d9baf1c..37bc9cb master -> master

... seems that RStudio isn't using/recognising the SSH keys properly. Not sure why, I will do some more digging later.

jmcvw commented 6 years ago

@mattnuttall00 Just a thought, but I don't think mnn1@STRF46D042C37A2 is trying to be an SSH key. I think it might be your office computer name.

rosemckeon commented 6 years ago

@jejoenje If you check in RStudio > Tools > Global options > Git/SVN there is a settings panel which lets you define the path to your public key and view the key which RStudio is using.

jejoenje commented 6 years ago

Thanks @rozeykex. Yep, that appears to be set correctly and pointing to the right public key, that's my confusion. Just double checked and error as above. The bit that puzzles me in particular is the first line -

ssh_askpass: exec(/usr/bin/ssh-askpass): No such file or directory

I guess I'm just not understanding what RStudio is trying to do here... I certainly have the right access rigths and the (remote) repo exists, because I can do this manually (in terminal) fine...

jejoenje commented 6 years ago

I should add that I created the key I'm trying to use manually, i.e. not using RStudio. I'm wondering whether that's confused matters... I'm logging off for the day, trying again tomorrow...

jejoenje commented 6 years ago

Ok, it's definitely an issue with Rstudio and the SSH keys specifically. I just did some work on one of my other repo's, which is still set to use HTTPS authentication. This pushes fine through the RStudio interface.

I'm wondering whether I should/could get RStudio to use a separate SSH key and add that one to GitHub too, but that seems counterintuitive and potentially confusing. Anyway, if I click "create RSA key' in RStudio's 'git' menu, it defaults to ~/.ssh/id_rsa, which clearly already exists... it doesn't seem to let me change that name (i.e. create a new key pair) and I don't really want to overwrite it...

mattnuttall00 commented 6 years ago

I'm afraid this is all beyond my understanding now, and I am thoroughly confused :) If I can set the starting bribe at 1 x pint, could I ask for someone to come and sit with me for 10 mins or so to go through this all and explain it to me? No rush, any day/time that is convenient

jejoenje commented 6 years ago

@mattnuttall00 Can I suggest that in the mean time you just use HTTPS authentication? To do this, in a terminal window in RStudio type:

git remote set-url origin https://github.com/mattnuttall00/PhD_Objective1.git

This should set the repo to use HTTPS authentication. If you now make a change, save, commit, and push using the Rstudio buttons, it should just ask for your username and password, and push ok...?

This is basically what I've had to do, for now, to get Rstudio to push to Github ok... I will keep looking for a better solution but may be a while - it might have something to do with a passphrase set for the SSH key...

But I'll have that bribe, please! :)

jejoenje commented 6 years ago

Out of interest @mattnuttall00, when you created the SSH key in RStudio, did you set a passphrase for the key?

mattnuttall00 commented 6 years ago

thanks @jejoenje , that seemed to work - I appear to be able to push up from RStudio to Github, and just have to enter my credentials. From what I can remember, no I didn't set a passphrase when creating a key in rstudio

anna-deasey commented 6 years ago

thanks @jejoenje , that seemed to work - I appear to be able to push up from RStudio to Github, and just have to enter my credentials. From what I can remember, no I didn't set a passphrase when creating a key in rstudio

@mattnuttall00 so is this working for you now? i just set-up by caching HTTPS credentials and didn't bother with SSH key. Does this matter? should i be using one over the other? ....so far ive only tried very basic things, but seems to be working ok

mattnuttall00 commented 6 years ago

@anna-deasey yes it seems to be working now between one Rstudio project and one Github repo. I briefly tried to set it up between a second project and a second repo but got some error messages, but I was rushing and haven't had a chance to fiddle around with it yet. Regarding your questions about whether it matters that we're not using a SSH key....I haven't a clue :)

jejoenje commented 6 years ago

@mattnuttall00 @anna-deasey You can use either authentication method. Horses for courses. SSH just makes life a bit easier in that you won't have to type in a username/password quite as much. It basically works by generating a pair of "keys", which provides the secured access. Have a look at this if you feel like some geeky reading.

adamaki commented 6 years ago

Yeeahh I got my first repository uploaded onto GitHub from RStudio! I followed all the instructions in this thread and got it working eventually but it certainly wasn't simple!

One question for you GitHub veterans - how do I manage my datasets between different work stations, e.g. if I want to work on the same project at work and at home? If I have my datasets on my home machine, the folder path will be different to my office machine, so the path in the code to load them will be different...

bradduthie commented 6 years ago

@adamaki I think that this is possible with GitKraken (rather, it does this seamlessly, I suspect), which I'll chat about tomorrow -- though I confess I'm not entirely sure. I think in the command line you should just be able to go to your local repository path and do the below.

git remote add origin https://github.com/user_name/GitHub_repository.git

But I'm not sure what this translates for in Rstudio. By 'path in the code to load them', I assume you mean push and pull from GitHub, or do you mean something else?

Really awesome that everyone is getting the Rstudio links with GitHub. I've still not attempted this, so if anyone wants to add a section to the version control notes, please feel free!

jmcvw commented 6 years ago

@adamaki I'm wondering if you are thinking about the same things I was getting at here and here. I have gone through a few different systems, with varying satisfaction - hence my interest in hearing what others do.

My current approach is to put all my data on both machines and make sure it resides at the same directory location. It is a bit of hassle if you have to reorganize your file structure, though, so there are other things I tried before doing this which were ok too.

adamaki commented 6 years ago

@bradduthie thanks for the advice. I'm probably not using the right terminology, but I meant if I have local versions of the same dataset on each workstation, how do I switch between datasets when working on different machines as the folder path will be different?

I should've read the thread properly first as I just noticed it's already been raised as an issue. Thinking about it, I guess a simple way would be to have a separate line of code to load the dataset from each machine and just run whichever one I'm working on at the time to load my data. But then there's the problem of syncing my data between machines if I change it in any way. Maybe Dropbox is the way to go as mentioned above, although my datasets are usually 10GB+ so I'd need to pay for storage...

@jmcvw yes that's exactly what I'm talking about, thanks!

bradduthie commented 6 years ago

@adamaki Ah! Sorry, I think I see what you mean now. I was assuming that the datasets were located in the same path within the repository (i.e., repository_folder/data/datafile_1.csv, etc.), but that repositories were located in different locations on different computers (which shouldn't cause a problem). Instead, you're saying that the data files themselves are outside the repository, so the path to accessing them needs to be different in different machines?

jmcvw commented 6 years ago

@adamaki What I used to do then is set the appropriate path using this line at the top of my scripts: dsn <- if (Sys.info()['user'] == 'USERNAME') 'D:/Office/Data/Dir' else 'C:/Home/Data/Dir'

you can then specify particular data file via dat <- file.path(dsn, 'DATA.csv')

adamaki commented 6 years ago

@bradduthie yes exactly! As I'm new to GitHub, I'm not sure what the convention is. Should I be keeping my datasets in the repository with my scripts? I'm not sure I want to share my datasets publicly tbh!

@jmcvw that's a neat way to get round the problem! I guess using the ifelse function would work here nicely.

bradduthie commented 6 years ago

@adamaki It might be useful to keep the data sets with the scripts, but one way of avoiding having it put up on GitHub publicly is to use a .gitignore file in your git repository (see this .gitignore file as an example in the GMSE repository. Git should just ignore any files listed in .gitignore that you specify in the repository -- which should stop them from being pushed to and pulled from GitHub.

So, for example, you could have a folder in your repository called data, then write this line in a .gitignore file:

data*

The asterisk will just tell git to ignore everything after 'data' (i.e., everything in the folder).

adamaki commented 6 years ago

So I thought I was doing well as I set up version control in RStudio and committed my scripts to GitHub, but today I made some changes and committed them to GitHub but nothing has appeared on GitHub. In RStudio, the history shows the changes have committed.

Any suggestions?

jmcvw commented 6 years ago

@adamaki Could you have commited, but not pushed? In RStudio you can push by clicking the green up-arrow shown below git_tab_controls Committing ensures the changes are tracked on your local machine - ie you can use git for version control without ever joining GitHub. To use use GitHub, you have to push your commits from your local repo