Get some code in here - Githubissues

surchs commented 3 years ago

Let's start by uploading the existing project code as it is, bugs and everything. What we want is for

[x] all scripts necessary to get from the input data (group avg maps) to final results (statistical results / figures)
[x] all documentation (dedicated READMEs, notes to self..., notebooks) for these scripts
[x] none of the data (that'll live somewhere else) as that would clutter the git history

to live inside this repository. For now we could go for a structure like this:

folder with functions or libraries or packages written for this project /
scripts folder /
- subfolder for every separate analysis (e.g. stability analysis, association analysis, ...) /
- script file with a number prefix according to the order we need to run it (e.g. 1_my_cool_script.m)
- 2_another_script.m
- ...
notebooks folder /
- notebook files, run once sequentially (so each cell is numbered n+1) and confirmed they roughly do what is expected.
README.txt (if it exists, otherwise leave out)

On the github side, you'll need to

[x] locally clone this repository,
[x] add the local files to the staging area
[x] commit them with a commit message
[x] push the commits to this github repository

Please keep track of your questions

This may be new stuff. In fact, it'd be great if you had never done this before. Because it would help us understand how to support this. So even though it might be more work for you, please keep track of the questions you have along the way (e.g. "what the hell is a git commit", "how do I ..."), regardless of whether you need help answering them. These questions are super valuable!

If you get stuck (e.g. googled twice and spent more than 30 minutes), please reply here with what you're trying to do, what you tried, and so on. We'll figure it out from there.

corinnerobert commented 3 years ago

Now I've uploaded the code to make the groups for the stability analysis, the second step would be to run opnmf on every splits using the code here https://github.com/asotiras/brainparts, however we did some slight modification of the opnmf_mem.m function (see "/data/chamal/projects/corinne/projects/Striatal_NMF/analysis/stability/brainlets/opnmf_mem_rai.m"). So I'm not sure if we were allowed to do that and if we're allowed to share it if that makes sens

surchs commented 3 years ago

OK, that's great.

however we did some slight modification of the opnmf_mem.m function So I'm not sure if we were allowed to do that and if we're allowed to share

That's a good point. Legally, the answer to this lies in the Original license of the package you altered. Part B there explicitly allows you to

use, reproduce, make derivative works of, display and distribute the Software,

provided that you abide by some additional requirements. So yes, you are allowed to make modifications and to share these modifications. We might have to take a closer look at that license again to make sure the code you are about to share can live under a more common license so whoever will work off of your code doesn't have as much of a headache in the future.

There is also a practical answer to this question that goes roughly like this: It's always better to re-use existing software that is maintained and supported by someone else rather than re-inventing your own thing that will then probably just sit around unmaintained after publication and become unusable. But it's also important to do practical decisions and spend your energy where it makes a difference. Here is a cool blog post on this. I think for your project, adjusting the existing function and resharing here makes perfect sense.

corinnerobert commented 3 years ago

Also, a lot of code was originally written by Raihaan Patel in our lab. I used some of his code as is and sometimes I slightly modified it. Again how can I share this code ?

surchs commented 3 years ago

That depends. Is Raihaan's code available as an installable or maintained library like numpy? Then you add it to your dependencies (and in the case of python, e.g. to a requirements.txt file). If Raihaan's code is just a bunch of scripts, some of which you use and some of those you have edited? Then you include all of these scripts here and acknowledge Raihaan as an author of these scripts (in the project Readme or other place). You usually don't write every piece of code alone, and I'm assuming that Raihaan is going to be a co-author on this paper. So that's all good.

Your question is important and it's great that you asked it. We'll have to provide some pointers to guidelines and best practices. At the end of the day, we want things to be simple as possible, but not simpler. Here is an article on "good enough" practices for reproducible research that I think has good ideas: https://doi.org/10.1371/journal.pcbi.1005510

corinnerobert commented 3 years ago

So I'm trying to add/commit/push the library folders with the modified opnmf functions but for some reasons it's not adding anything

surchs commented 3 years ago

Hhm. I'm not sure exactly what that means. Couple of thoughts:

if you are new to git / github, it may be worth it to go through one of the tutorials. These are pretty good.
if you are relatively new to using the terminal / command line, it might be easier to use a graphical interface for git. There are many options. I heard good things about Gitkraken and Sourcetree and [Tower](Learning Learn Project Management ), I think github has their own desktop app. If you use an editor like VScode, there are also git plugins for that. Long term, the terminal is probably the better interface.
it's often tricky to understand or fix tech problems without sitting in front of the same machine. There are some good recommendations and best practices out there that can serve as a guideline for what to include. I think this one on reproducible examples is a good starting point. Asking questions that get answered is a super useful skill long term, so also worthwhile to spend some time on

Let me know if any of these get you further. And, if you resolved your issue, how you did that.

surchs commented 3 years ago

@corinnerobert: I saw that you added some additional scripts and libraries. How are things going? Are all your scripts here, should we start thinking about our next steps? You can check off the checkboxes in the initial issue description to keep track of the progress if you want.

corinnerobert commented 3 years ago

So the github tutorials were really helpful. I realized that my previous problem had to do with the staging area vs commit.

I think I submitted all the scripts that I need. I'm going through the documentation that I have and I think the next step would be to make the scripts more generalizable and change all the path variables. For instance, the scripts for the step 0 and 1 only work for the left hemisphere, whereas some scripts work for both (step 5.5, 7) and sometimes I have one script for each hemisphere (step 4 and 6) so I'd like to clean that up.

surchs commented 3 years ago

OK, that's fantastic. Congrats! I would say: add the documentation that you have as is (we don't need to keep the repo's history once we share) and then we can close this issue and open a new one to do the organization of the existing code.

corinnerobert commented 3 years ago

So I added the documentation that I think is useful (other than that I only have documentation on the organization of my files on the CIC and not about the steps/scripts specifically). I just noticed the notebook folder you mentioned, but I'm not sure what it is and what I'm supposed to put in it.

surchs commented 3 years ago

There sometimes isn't a clear distinction between "scripts" and "notebooks" in the sense that both are used to produce some intermediate data output that is then fed to another step (script or notebook). There are some good arguments to use scripts and notebooks for different things, i.e.:

use notebooks for quick, initial exploration. and for communication of your results (put code, figures, and accompanying text in one document).
as you develop ideas, turn them into well documented and tested code in the form of scripts or libraries. Code that produces )(intermediate) outputs should be in scripts, not notebooks.

Some of the more important reasons for why notebooks aren't great to write maintainable code are the same things that make them great for exploration. For example, the fact that you can execute code cells in any order you choose (not sequentially), is great for exploration but very bad for predictable outputs. Similarly, the fact that you can include figures, text, code, videos, all in one document is great for exploration and communication but makes version control a pretty nightmarish scenario (there are some fixes). For example, this little guy here scripts_folder/0_make_concatenated_input_matrices.ipynb is 75 MB in size - probably because of a lot of cool visual things. But this size and complexity makes it hard to inspect easily or even just here on github.

So, long story short: in an ideal world, you have notebooks and scripts / libraries - and these things do very different things. Have a look here for a nice general template for research code projects (I am very loosely basing my recommendations on that). For now, keep things as they are. We'll see how much time we want to invest to make things "ideal".

Closing issue since all points are completed!

corinnerobert / striatum_micro_nmf

Get some code in here #1

Please keep track of your questions