Open CodyCBakerPhD opened 3 months ago
Less about code, more about capabilities that were added (and point to documentation)
In the last year, NeuroConv has developed fully automated processes for building and deploying Docker images of the central package, as well as tangential data transfer utilities for use in cloud environments. These workflows are triggered through the free-to-use GitHub Actions on every official release as well as development branches refreshed daily. All Dockerfiles can be found in the public open-source repository under the /neuroconv/dockerfiles
folder.
Additionally, a number of helper functions have been added to the neuroconv.tools.aws
submodule such as an API function for automatically setting up an entire AWS EC2 Batch infrastructure, including all related details such as compute environments, job queue, and job definition. This tool is then leveraged to launch containers of the aforementioned images onto an on-demand EC2 instance in a two-step process: (i) Rclone is used to transfer data from a remote cloud storage source (such as Google Drive or Dropbox) onto the EC2 instance, where it is then (ii) converted to NWB format via a YAML specification file and uploaded directly to the DANDI archive. When all tasks are complete, all requested resources are spun down and cleaned up, minimizing costs to the user.
To ensure this pipeline continues to work far into the future, all steps from the Docker images to the helper functions are tested via pytest
in continuous integration:
While individual batch job statuses can be tracked from the AWS dashboard, our entire workflow also sends status updates to a central DynamoDB table
with plans to further improve the resolution and provenance of the tracking in the future.
All usage instructions may be found in the official NeuroConv documentation, in particular:
Thanks @CodyCBakerPhD for the helpful summary. A couple of quick questions:
data transfer in AWS tests
and data conversion in AWS test
are missing the link, could you please add thosepackages
in the CatalystNeuro GitHub organization at https://github.com/orgs/catalystneuro/packages?repo_name=neuroconv and then registered with the GitHub Container Registry so that they can be installed via something like docker pull ghcr.io/catalystneuro/neuroconv:latest
. Is that how this work or am I missing something important?Sorry, I should have indicated this was still WIP - I was going to ping you once it's ready
Sorry, I should have indicated this was still
Got it. Sorry for being eager with questions. This is very cool stuff
@oruebel OK, the rest has been filled in
Though some PRs are still under review, you will want to update the links for things after those get merged. Those sections that are not yet merged are
OK, the rest has been filled in
Thanks for the helpful summary!
sends status updates to a central DynamoDB table
Is this table public, and if yes, could you add the URL? If it is internal, is this accessible to the CN team?
some PRs are still under review, you will want to update the links for things after those get merged.
Thanks for the head up. Will do.
Is this table public, and if yes, could you add the URL?
Nope, since all access to/from is metered and charged
If it is internal, is this accessible to the CN team?
Yes, but there is nothing particularly special about this table aside the fact that it is the one used by the testing suite
The general idea is that the process can use DynamocDB to track status updates from any such table you want to specify. So if you used the tools yourself (including demo) you would get your own table for your own use, or you could make a public one for your team and everyone could then use it, etc.
Though also, nothing terribly special about DynamoDB in that respect (we could send status updates to any external target, like how we handle progress updates on NWB GUIDE), just that it is adjacent to all the other AWS entities and so feels a natural go-to for this kind of thing
So if you used the tools yourself (including demo) you would get your own table for your own use,
Thanks for the clarification. That makes sense. My impression was that this linkage to the table may be hard-coded, but having it configurable to the user makes sense.
For NIH report