Closed carshadi closed 11 months ago
Will generate benchmark of dense SWCs on Janelia instance to compare import load times.
Importing set of 2355 neurons into HortaCloud took 2:15. Same set imported into Janelia instance took 16 seconds, about 8 times faster.
I did a test for Hortacloud with the same set of neurons but from local disk. It took about 20-21 seconds. A bit slower, but clearly s3fs is causing some issues for the swc import.
the running time SWCImport service of /data/s3/janelia-mouselight-imagery/reconstructions/2018-10-01/build-brain-output/frags-with-5-or-more-nodes/as-swcs.tar
that contains ~110K SWC entries was ~10s from the time the service was queued to the time it was completed. This was done using Postman not from the workstation.
Using the "Load Linux SWC Folder into New Workspace on Sample" option, loading 2,354 SWC files from an S3 bucket takes ~2m. Assuming linear scaling, loading 2 million SWCs would take ~33hr . Clearly, loading data from S3 is expected to be slower than from a local or network drive, but it would be great if there were ways to speed it up (parallelize somehow?). It also seems tricky to get that number of files onto the EC2 instance's local disk via Temporary Files, OneDrive or Google Drive. Perhaps if there were a way to unzip files within the AppStream instance, that would help?
The swcs I used are here
s3://aind-msma-morphology-data/test/from_google_exaSPIM_609281_2022-11-03_13-49-18-training-data_n5_whole-brain_consensus_1000/