Open gwaybio opened 5 years ago
Also note that I am brainstorming this general idea in this specific repository relating to the STARR grant b/c it is open source
Thanks for leading this discussion!
I like this idea because of
So the profile-generation repo would capture everything that we do in the profiling handbook, and nothing else. Ideally, at some point, this repo would only contain the WDL workflow (or equivalent) used to process the data.
The automation question merits a separate discussion, out of scope right now. It certainly is relatively consistent so indeed this is possible, and needs a lot more work to fully automate. But indeed, that's another reason to consider this option.
What's next? Do you want to try this out on this project? @gwaygenomics
There are many different analysis one could do, given the same dataset.
Yeah definitely! Also, depending on the size of the profiles specifically, github can handle data versioning. BBBC will store the raw images?
Ideally, at some point, this repo would only contain the WDL workflow (or equivalent) used to process the data.
Depending on the size of the data, I think it could also store processed profiles. Data versioning FTW :tada:
What's next? Do you want to try this out on this project?
yes, lets try it out! Currently, I don't think the profile processing lives here (do we know where it lives?). So it will be natural to use this strategy here.
Another thing to consider is if the analysis should live in the broadinstitute
org or the carpenterlab
org. I am thinking carpenterlab since (presumably) we have more control over it and it gives the lab more visibility. (there is also a new and nifty transfer issue feature on Github, so an ownership transfer should be relatively painless)
BBBC will store the raw images?
Not sure yet; ideally IDR, but it isn't easy to directly access images
yes, lets try it out! Currently, I don't think the profile processing lives here (do we know where it lives?). So it will be natural to use this strategy here.
Indeed, I don't see any profile processing notes; you'd need to check with Beth.
Another thing to consider is if the analysis should live
broadinstitute
works well especially for collaborative projects
note that I transferred this issue over from https://github.com/broadinstitute/profiling-resistance-mechanisms
This repo currently has the closest workflow to what is described above
@shntnu @jccaicedo @MarziehHaghighi
I was thinking about our recent discussion on github reproducibility a bit more. I am wondering about different potential workflows and can think of some additional potentially helpful setups.
First Setup
The first setup is as I described on Friday:
0.generate-profiles
) that stores code, QC, and profile results.0.generate-profiles
.Potential Alternative?
Perhaps a second setup could separate the processing code and downstream analysis into two distinct repositories. This setup could work well for a couple of reasons.
profiling init
(and then bash scripts would be auto populated).Of course, every project is different, and individual decisions are required. (The same goes for storing the profiles in the actual repo! and public/private repo debate too)