Open surchs opened 3 years ago
move your folder with the correctly renamed intermediate files into the git repo (and make sure the folder is called data). Be careful not to add the folder or its contents to git though
How do I add the data folder to the git repo if I told git to ignore it with a .gitignore file? I'm confused
There is a section in https://www.atlassian.com/git/tutorials/saving-changes/gitignore on Committing an ignored file. Would this help? I haven't done this myself before so i will defer to Sebastian
Hey everyone. @corinnerobert: great question and important conceptual thing about git! I don't have a great tutorial to link here (this is complete but hard to follow, I find) and this is a bit tricky - I'll find something or make something.
The short answer is that when you have a local git repository that lives on your hard drive under a folder, there are three things that you need to distinguish:
ls
. All the happy files in there are on your harddrive. But, that's the key part to understand, they aren't necessarily tracked by git just because they are in this folder. git add that_file_name.txt
. This puts the file in the git staging area. This is kinda like your packaging area. You put all the stuff you want git to track (e.g. new files or files that have changed) here. git commit
to do that, and then enter a little message to explain what's in the package (aka commit). This is the final step and makes git keep track forever of the stuff in this commit.So coming back to your question:
How do I add the data folder to the git repo if I told git to ignore it with a .gitignore file? I'm confused
Perfectly reasonable. The answer is: you put the folder in your local working directory, but because you tell git to ignore it (with the .gitignore) file, git will never aks you whether you want to move that folder to the staging area (unless you do some of the black magic @raihaan pointed out) or even whether you want to commit it.
I like to think of it as an office: | My office | Git |
---|---|---|
My messy office floor | My local working directory | |
My organized desktop where I prepare a package to send to a friend who runs a beautiful archive of things | My git staging area where I collect things I want to commit to my repo | |
My beautifully organized and tied package to my friend who now files it into a beautiful archive | My beautifully commented git commit that is now tracked by git and inside my repo |
So basically you copying the folder to the local working directory is equal to you keeping a grey box of stuff you sometimes need on the floor in your office. But you don't want to send this box to your archivist friend because it's heavy and messy and you don't want it archived. So you make a little note on your desktop that says: "ignore grey boxes of stuff when tying packages". Hope that clarifies it.
edit: maybe this tutorial is a bit more clear on this than the previous link
This is super clear and it answers my question thank you!
Hi, I realized I have some scripts that need some individual maps (for instance script 5) and I'm not sure how to deal with that
@corinnerobert: can you describe this problem in some more detail e.g.
I believe this will clarify the problem for both of us. Based on this we'll just update our plan for the "general" and "paper related" data releases and then see if that creates any new problems. It may be useful to go back to the "data-flow chart" we created in the beginning to map out the inputs and outputs of each step.
It is the script 5_sample_nmf_to_nii.py As inputs it uses:
These files are part of the "general data release plan", only they are not in the data/ folder
Also, as we are using some python or matlab packages, more specifically:
Can we just say in the documentation to download those packages and tell the user to adjust the paths to those packages in the scripts?
ah ok. if it's in the general data release then I think we're good. I would suggest something like this:
/data
. Maybe you can come up with some good names for each of them. data/
folder and make sure the paths resolve correctly.Also, as we are using some python or matlab packages, more specifically:
* [TractRec package](https://github.com/CoBrALab/TractREC/tree/master/TractREC) in script 5 * [Brainlets package](https://github.com/asotiras/brainparts) in scripts 2 * [PLS package](http://pls.rotman-baycrest.on.ca/UserGuide.htm) in scripts 6
Can we just say in the documentation to download those packages and tell the user to adjust the paths to those packages in the scripts?
Yeah, that's totally fine. For the python case, you can just add the packages to your requirements.txt
if it is published on pypi.org. Your docs should have some kind of "how to setup the compute / processing environment" section (sometimes just called "Installation") where you can list the software requirements as links to the github repos. You are not required to provide a working environment for the reader but it's nice to make sure that they can follow along in your footsteps and with some reasonable work on their own get your code to run.
Also, as we are using some python or matlab packages, more specifically:
* [TractRec package](https://github.com/CoBrALab/TractREC/tree/master/TractREC) in script 5 * [Brainlets package](https://github.com/asotiras/brainparts) in scripts 2 * [PLS package](http://pls.rotman-baycrest.on.ca/UserGuide.htm) in scripts 6
Can we just say in the documentation to download those packages and tell the user to adjust the paths to those packages in the scripts?
Ah, OK. I see what's the issue here. None of these are "installable" in the sense that you can just run a command to have them in your path (edit: what I mean is, you cannot resolve these dependencies with dependency management like pip or Pipenv). For the matlab packages, that's clear and expected. You'll only need to point to their installable files / git repos (and maybe remind readers of the addpath(genpath(..))
stuff to add them. For the TractRec
python scripts, you could do one of two things:
TractRec
repo as a git submodule to your lib
folder.The second option has two possible advantages:
TractRec
scripts through the submodule directly provided they know to run git clone with the recursive
flag. that's not guaranteed, you'll probably have to document this very well - git submodules aren't very accessible for git beginners. Importantly, this may also be true for yourself so it's perfectly reasonable for you to decide that this isn't worth your time right now
We now have a data sharing plan. Now the code can be edited so all the hardcoded paths point to the file names and locations defined in the data sharing plan.
Let's say the data will live under the root of this repository in a folder called
data
(i.e./data
). Then a script in the/scripts_folder
pointing to the fractional anisotropy input matrices should use the relative path:../data/{leftstri or rightstri}_fa_input_matrix.mat
and so on. So to complete this issue:
/scripts_folder
) and the new file names defined in the data sharing plangit fetch
to get the updated git history from github (but not yet download any files) and thengit status
to learn what, if anything, has changed remotely or locally. If there aren't any conflicts you can then rungit pull
to download the new or updated files to your local repo copy.data
). Be careful not to add the folder or it's contents to git though, because we don't want to track the data. A good way to tell git to just ignore the entire./data/
directory is to use a.gitignore
file. This is just a file called.gitignore
(no file ending). In it you can writedata/
to ignore the data directory. You can take a look at some common templates too .This might take some time to do. Let me know if you encounter any questions or difficulties.