dandi / dandisets-healthstatus

Healthchecks of dandisets and support libraries (pynwb and matnwb)
0 stars 1 forks source link

Remove parallel processing? #6

Closed jwodder closed 1 week ago

jwodder commented 1 year ago

Seeing as all operations through datalad-fuse and/or fsspec will end up synchronized by a thread lock, trying to process files in parallel will likely not accomplish anything other than making it more likely we encounter an error.

@yarikoptic Thoughts?

yarikoptic commented 1 year ago

well, I still think there might be some benefit from parallelization in some tests which would do all kinds of CPU-intensive traversal of the file etc while fsspec could actually return desired data (from cache in particular) quite fast. Whenever last time I looked at drogon when it was running the script I saw quite large load and lots of busy python processes . may be it was before locking was done? since now I do not see much of being done among those MATLAB and python processes, so indeed look like mostly "locked" I guess and 3433101 pid which is datalad fusefs is the only somewhat busy one. But I think let's keep it this way for a bit at least -- drogon is quite busy in IO ATM due to backup and me also moving some data at the LVM level to replace some drives. May be whenever IO becomes freed up we would see some parallel Matlab/Python processes appearing etc..

yarikoptic commented 1 year ago

I think "parallel processing" works even within each dandiset but we need to limit somehow on how many files could be processed in parallel within each dandiset since we seems to just cause too much competition, e.g. now there is over 50 MATLAB processes:

(base) dandi@drogon:/mnt/backup/dandi/heroku-logs/dandi-api$ ps auxw | grep MATLAB | nl | tail
    51  dandi     169458  7.5  0.8 6088640 558264 pts/15 Sl+  17:37   0:04 /mnt/backup/apps/MATLAB/R2022b/bin/glnxa64/MATLAB -batch nwb = nwbRead('/mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse/000009/sub-anm00264942/sub-anm00264942_ses-20170627T094013_ecephys+ogen.nwb', 'savedir', '/mnt/fast/dandi/dandisets-healthstatus') -nodesktop -prefersoftwareopengl
    52  dandi     170208  7.0  0.8 6079504 553000 pts/15 Sl+  17:37   0:04 /mnt/backup/apps/MATLAB/R2022b/bin/glnxa64/MATLAB -batch nwb = nwbRead('/mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse/000009/sub-anm00264943/sub-anm00264943_ses-20170627T094016_ecephys+ogen.nwb', 'savedir', '/mnt/fast/dandi/dandisets-healthstatus') -nodesktop -prefersoftwareopengl
    53  dandi     170681  5.0  0.6 2055996 432176 pts/15 Sl+  17:37   0:03 /mnt/backup/apps/MATLAB/R2022b/bin/glnxa64/MATLAB -batch nwb = nwbRead('/mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse/000009/sub-anm00264943/sub-anm00264943_ses-20170627T094019_ecephys+ogen.nwb', 'savedir', '/mnt/fast/dandi/dandisets-healthstatus') -nodesktop -prefersoftwareopengl
    54  dandi     171216  7.6  0.8 6081756 555276 pts/15 Sl+  17:37   0:04 /mnt/backup/apps/MATLAB/R2022b/bin/glnxa64/MATLAB -batch nwb = nwbRead('/mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse/000009/sub-anm00264943/sub-anm00264943_ses-20170627T094028_ecephys+ogen.nwb', 'savedir', '/mnt/fast/dandi/dandisets-healthstatus') -nodesktop -prefersoftwareopengl
    55  dandi     176707  6.0  0.6 2055996 425156 pts/15 Sl+  17:37   0:02 /mnt/backup/apps/MATLAB/R2022b/bin/glnxa64/MATLAB -batch nwb = nwbRead('/mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse/000009/sub-anm00264943/sub-anm00264943_ses-20170627T094033_ecephys+ogen.nwb', 'savedir', '/mnt/fast/dandi/dandisets-healthstatus') -nodesktop -prefersoftwareopengl
    56  dandi     180884  5.8  0.4 1513180 294652 pts/15 Sl+  17:37   0:02 /mnt/backup/apps/MATLAB/R2022b/bin/glnxa64/MATLAB -batch nwb = nwbRead('/mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse/000009/sub-anm00264943/sub-anm00264943_ses-20170627T094037_ecephys+ogen.nwb', 'savedir', '/mnt/fast/dandi/dandisets-healthstatus') -nodesktop -prefersoftwareopengl
    57  dandi     185485  4.6  0.2 681132 194548 pts/15  Sl+  17:37   0:01 /mnt/backup/apps/MATLAB/R2022b/bin/glnxa64/MATLAB -batch nwb = nwbRead('/mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse/000009/sub-anm00264943/sub-anm00264943_ses-20170627T094041_ecephys+ogen.nwb', 'savedir', '/mnt/fast/dandi/dandisets-healthstatus') -nodesktop -prefersoftwareopengl
    58  dandi     193908  2.3  0.1 187080 92528 pts/15   R+   17:38   0:00 /mnt/backup/apps/MATLAB/R2022b/bin/glnxa64/MATLAB -batch nwb = nwbRead('/mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse/000009/sub-anm00264943/sub-anm00264943_ses-20170627T094049_ecephys+ogen.nwb', 'savedir', '/mnt/fast/dandi/dandisets-healthstatus') -nodesktop -prefersoftwareopengl
    59  dandi     195594  1.5  0.1 164444 70400 pts/15   R+   17:38   0:00 /mnt/backup/apps/MATLAB/R2022b/bin/glnxa64/MATLAB -batch nwb = nwbRead('/mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse/000009/sub-anm00264943/sub-anm00264943_ses-20170627T094051_ecephys+ogen.nwb', 'savedir', '/mnt/fast/dandi/dandisets-healthstatus') -nodesktop -prefersoftwareopengl
    60  dandi     201415  0.0  0.0   6316   732 pts/4    S+   17:38   0:00 grep MATLAB

If we parallelize across dandisets (#9 ) we should have e.g. no more than 5 per dandiset

jwodder commented 1 week ago

As of #77, we no longer use datalad-fuse.