corinnerobert / striatum_micro_nmf

0 stars 0 forks source link

Edit scripts to correspond to the data sharing plan and test scripts #6

Open surchs opened 3 years ago

surchs commented 3 years ago

We now have a data sharing plan. Now the code can be edited so all the hardcoded paths point to the file names and locations defined in the data sharing plan.

Let's say the data will live under the root of this repository in a folder called data (i.e. /data). Then a script in the /scripts_folder pointing to the fractional anisotropy input matrices should use the relative path:

../data/{leftstri or rightstri}_fa_input_matrix.mat

and so on. So to complete this issue:

This might take some time to do. Let me know if you encounter any questions or difficulties.

corinnerobert commented 3 years ago

move your folder with the correctly renamed intermediate files into the git repo (and make sure the folder is called data). Be careful not to add the folder or its contents to git though

How do I add the data folder to the git repo if I told git to ignore it with a .gitignore file? I'm confused

raihaan commented 3 years ago

There is a section in https://www.atlassian.com/git/tutorials/saving-changes/gitignore on Committing an ignored file. Would this help? I haven't done this myself before so i will defer to Sebastian

surchs commented 3 years ago

Hey everyone. @corinnerobert: great question and important conceptual thing about git! I don't have a great tutorial to link here (this is complete but hard to follow, I find) and this is a bit tricky - I'll find something or make something.

The short answer is that when you have a local git repository that lives on your hard drive under a folder, there are three things that you need to distinguish:

  1. Your local working directory. That is just the folder as you can see it in a file explorer or look at in a command line with e.g. ls. All the happy files in there are on your harddrive. But, that's the key part to understand, they aren't necessarily tracked by git just because they are in this folder.
  2. Let's say you have a new file in the local folder and this file isn't tracked by git. You think: this is a good file, I'd like to put this under version control. Then you run git add that_file_name.txt. This puts the file in the git staging area. This is kinda like your packaging area. You put all the stuff you want git to track (e.g. new files or files that have changed) here.
  3. Now when you have everything you want in your staging area, you tie your package and put it inside git. This is called a commit. You use git commit to do that, and then enter a little message to explain what's in the package (aka commit). This is the final step and makes git keep track forever of the stuff in this commit.

So coming back to your question:

How do I add the data folder to the git repo if I told git to ignore it with a .gitignore file? I'm confused

Perfectly reasonable. The answer is: you put the folder in your local working directory, but because you tell git to ignore it (with the .gitignore) file, git will never aks you whether you want to move that folder to the staging area (unless you do some of the black magic @raihaan pointed out) or even whether you want to commit it.

I like to think of it as an office: My office Git
My messy office floor My local working directory
My organized desktop where I prepare a package to send to a friend who runs a beautiful archive of things My git staging area where I collect things I want to commit to my repo
My beautifully organized and tied package to my friend who now files it into a beautiful archive My beautifully commented git commit that is now tracked by git and inside my repo

So basically you copying the folder to the local working directory is equal to you keeping a grey box of stuff you sometimes need on the floor in your office. But you don't want to send this box to your archivist friend because it's heavy and messy and you don't want it archived. So you make a little note on your desktop that says: "ignore grey boxes of stuff when tying packages". Hope that clarifies it.

edit: maybe this tutorial is a bit more clear on this than the previous link

corinnerobert commented 3 years ago

This is super clear and it answers my question thank you!

corinnerobert commented 3 years ago

Hi, I realized I have some scripts that need some individual maps (for instance script 5) and I'm not sure how to deal with that

surchs commented 3 years ago

@corinnerobert: can you describe this problem in some more detail e.g.

I believe this will clarify the problem for both of us. Based on this we'll just update our plan for the "general" and "paper related" data releases and then see if that creates any new problems. It may be useful to go back to the "data-flow chart" we created in the beginning to map out the inputs and outputs of each step.

corinnerobert commented 3 years ago

It is the script 5_sample_nmf_to_nii.py As inputs it uses:

These files are part of the "general data release plan", only they are not in the data/ folder

corinnerobert commented 3 years ago

Also, as we are using some python or matlab packages, more specifically:

Can we just say in the documentation to download those packages and tell the user to adjust the paths to those packages in the scripts?

surchs commented 3 years ago

ah ok. if it's in the general data release then I think we're good. I would suggest something like this:

surchs commented 3 years ago

Also, as we are using some python or matlab packages, more specifically:

* [TractRec package](https://github.com/CoBrALab/TractREC/tree/master/TractREC) in script 5

* [Brainlets package](https://github.com/asotiras/brainparts) in scripts 2

* [PLS package](http://pls.rotman-baycrest.on.ca/UserGuide.htm) in scripts 6

Can we just say in the documentation to download those packages and tell the user to adjust the paths to those packages in the scripts?

Yeah, that's totally fine. For the python case, you can just add the packages to your requirements.txt if it is published on pypi.org. Your docs should have some kind of "how to setup the compute / processing environment" section (sometimes just called "Installation") where you can list the software requirements as links to the github repos. You are not required to provide a working environment for the reader but it's nice to make sure that they can follow along in your footsteps and with some reasonable work on their own get your code to run.

surchs commented 3 years ago

Also, as we are using some python or matlab packages, more specifically:

* [TractRec package](https://github.com/CoBrALab/TractREC/tree/master/TractREC) in script 5

* [Brainlets package](https://github.com/asotiras/brainparts) in scripts 2

* [PLS package](http://pls.rotman-baycrest.on.ca/UserGuide.htm) in scripts 6

Can we just say in the documentation to download those packages and tell the user to adjust the paths to those packages in the scripts?

Ah, OK. I see what's the issue here. None of these are "installable" in the sense that you can just run a command to have them in your path (edit: what I mean is, you cannot resolve these dependencies with dependency management like pip or Pipenv). For the matlab packages, that's clear and expected. You'll only need to point to their installable files / git repos (and maybe remind readers of the addpath(genpath(..)) stuff to add them. For the TractRec python scripts, you could do one of two things:

  1. Also just tell people to clone the scripts locally and then manually add them to the python path as you have done
  2. You could add the TractRec repo as a git submodule to your lib folder.

The second option has two possible advantages:

  1. a reader can just clone your repo and also get the TractRec scripts through the submodule directly provided they know to run git clone with the recursive flag. that's not guaranteed, you'll probably have to document this very well - git submodules aren't very accessible for git beginners. Importantly, this may also be true for yourself so it's perfectly reasonable for you to decide that this isn't worth your time right now
  2. The other advantage would be that you know where the scripts will live relative to your path so you can just hard-code them into your scripts and the reader won't have to change anything (again, provided they know how to get the submodules, because by default they aren't cloned).