Open Lestropie opened 1 month ago
As noted in https://github.com/Lestropie/neurocontainers/pull/1 I've been looking a little into DataLad. I'ts something I hear in context all the time but have never fully wrapped my head around it or gotten my hands dirty.
Essentially, it's a git
wrapper that is more tailored to data than to source code.
Here's how I would currently envisage this working, based on preliminary reading:
pip
.dwifslpreproc
, FBA) would each be stored as a DataLad dataset on an appropriate remote server.
datalad get
call at the start of the session, which downloads the actual content of those files that need to be present for execution of the interactive content of that session.git reset
. So while this doesn't provide the benefit of OverlayFS in terms of not necessitating use of the -force
option to overwrite the original dataset file, it does provide the ability to restore the original content.
(Hypothetically, we could detect that a file is under DataLad version control, and use that to disable necessity of the -force
option; but not sure if we want to jump in that deep)datalad run
, which commits both all file changes and provenance information about what was executed.git annex
, which means Datalad datasets can be stored there (our workshop data is probably a bit too large to be putting on GitHub). There's even an extension to expedite this.So from what I've read I think DataLad is worth pursuing. Hypothetically, anyone could still use NeuroDesk to access and interact with the workshop content if they wanted to. But given they would need to install this extra dependency, and also clone the dataset onto a location on the host system, it doesn't provide that much, at least currently. Maybe if it were to offer an installer that installs DataLad, clones the dataset in some default location on the host mount, fully clones the slide contents, and it's then up to the user to manually clone further content as they encounter it, it may provide some benefit.
Agree that datalad would be a great solution for this :) Datalad is already integrated in Neurodesk, so it would be easy for users to download the workshop data.
Following some discussion with @stebo85 in https://github.com/Lestropie/neurocontainers/pull/1, I want to evaluate the prospect of open access workshop configuration being primarily an MRtrix3 functionality rather than a Neurodesk one. The issue with integrating into the Neurodesk official image list, IIUC, is that that content would then need to be downloaded for anybody installing the Neurodesk app, regardless of whether they have any intention of running an MRtrix3 container. For the size of data we're dealing with, I don't think that's reasonable.
What I'm contemplating instead is a script that would potentially be a part of the MRtrix3 main repository itself. Upon execution, that script would:
This would make the workshop environment accessible both within Neurodesktop (hopefully) and within any other environment where MRtrix3 has been installed and configured appropriately.
Open to alternative suggestions.