Closed F-said closed 1 year ago
@georgebuzzell @jessb0t What are some priorities for our HPC datasets? I assume we have the following needs.
And I assume these are our wants:
Is there anything I am missing?
@F-said How are you defining "shareability" in this context?
@jessb0t shareable across our HPC and since datalad supports it, the internet
@F-said Great question. Here are the priorities (please feel free to ask clarification questions or challenge if some/all are really priorities):
Highest priority, "must haves"
"very nice to have, but not must haves"
@F-said @jessb0t comments/questions welcome
@F-said : 3, 4, 5, and part of 8 are squarely on my plate. As discussed, we will work together so that I create these in lockstep with your technical work. @georgebuzzell Having played around with the data dictionary and central tracker, conceptually, for existing studies, I honestly do not think that we should take any time to create manual checks and updates to the tracker. I feel that it would take as much work to create manual processes (which are not scalable) as to create semi-automated processes (that is, basic script + protocol for the study lead to run it). The reason I feel this way is that, for example, the online studies have ~30 surveys, each of which has 10+ questions. Verifying that a given survey/score is available for a given participant is visually very difficult, and a manual process would be time-consuming and error-prone. Within the same project time-scale, I think we can get simple, semi-automated processes in place that prevent data monitoring from being a burden on study leads (and ensuring higher-quality trackers). I suspect that you and I are on the same page here (namely, utilize scripting in the immediacy to create a simple system that can be refined, improved, and made even more automated over time), but I wanted to be clear that the protocols I have started developing require some level of scripting (that is, there is no "fully manual" option because existing studies are so complex as to make such an option seemingly nonviable to me).
Purpose Why bother?
Make research go faster -> More papers -> Profit???Encompasses Issues
190
189
186
185
182
181
180
Informal UML![10,000_feet drawio](https://user-images.githubusercontent.com/26397102/155395321-02c6e413-2acc-4ffd-97c8-3a37a4bd7cf8.png)
Ideas