Open asmacdo opened 4 months ago
@satra do you see ability to turn on hourly accounting in
it seems with current approach to data transfer of the HOMEs we are running "hot" in terms of :dollar:
can't seem to get hourly stuff activated with the account we have. let's hope it goes down after the transfer is done. but new efs seems to have more data than old efs :)
the consumption might go down slightly.
The new efs with more data than old is (probably) caused by an error in my script (I missed a trailing slash)
But we've corrected it and hopefully we will be back == tomorrow
I think I nailed one of the major contributors to discrepancy -- sparsely allocated files for nwb caches. E.g. typically like
[ec2-user@ip-10-1-2-62 efs]$ ls -ld /mnt/{legacy-,}efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/*
-rw-r--r--. 1 ec2-user users 4966709395 Mar 9 2023 /mnt/efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/2fdedac0b7dc61ef8b3e36dfaf0cc00f5475eb0ffe0ebcefb9bcb95a09dbbf3e
-rw-r--r--. 1 ec2-user users 46367374013 Mar 9 2023 /mnt/efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/378e4baa2dd5db3035bffa4d261efc9a76c2fd400bbb51bf8d245f687cf15b6a
-rw-------. 1 ec2-user users 546 Mar 9 2023 /mnt/efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/cache
-rw-r--r--. 1 ec2-user users 4966709395 Mar 9 2023 /mnt/legacy-efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/2fdedac0b7dc61ef8b3e36dfaf0cc00f5475eb0ffe0ebcefb9bcb95a09dbbf3e
-rw-r--r--. 1 ec2-user users 46367374013 Mar 9 2023 /mnt/legacy-efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/378e4baa2dd5db3035bffa4d261efc9a76c2fd400bbb51bf8d245f687cf15b6a
-rw-------. 1 ec2-user users 546 Mar 9 2023 /mnt/legacy-efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/cache
[ec2-user@ip-10-1-2-62 efs]$ du -scm /mnt/{legacy-,}efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/*
48 /mnt/legacy-efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/2fdedac0b7dc61ef8b3e36dfaf0cc00f5475eb0ffe0ebcefb9bcb95a09dbbf3e
400 /mnt/legacy-efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/378e4baa2dd5db3035bffa4d261efc9a76c2fd400bbb51bf8d245f687cf15b6a
1 /mnt/legacy-efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/cache
4737 /mnt/efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/2fdedac0b7dc61ef8b3e36dfaf0cc00f5475eb0ffe0ebcefb9bcb95a09dbbf3e
44220 /mnt/efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/378e4baa2dd5db3035bffa4d261efc9a76c2fd400bbb51bf8d245f687cf15b6a
1 /mnt/efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/cache
49405 total
but in some cases might be x1000 factor! Looking across similarly placed locations for nwb-cache
we get
[ec2-user@ip-10-1-2-62 efs]$ du -scm /mnt/legacy-efs/home/*/dandi-notebooks/*/*/nwb-cache/ > /tmp/du-scm-nwb-cache-legacy.out
[ec2-user@ip-10-1-2-62 efs]$ du -scm /mnt/efs/home/*/dandi-notebooks/*/*/nwb-cache/ > /tmp/du-scm-nwb-cache-new.out
[ec2-user@ip-10-1-2-62 efs]$ tail -n 1 /tmp/du-scm-nwb-cache*out
==> /tmp/du-scm-nwb-cache-legacy.out <==
9233 total
==> /tmp/du-scm-nwb-cache-new.out <==
1937116 total
so about 2TB of waste (not quite full account but there might be more locations I guess).
But overall with rsync
we indeed might have caused that x2 of unnecessary transfer by de-sparsifying similar files. Caches of fsspec
also might have similar curse.
IMHO we should remove all ~/.cache
, __pycache__
and nwb-cache
folders we find in the target new efs
location. But first also identify where else could similar gotchas has happened.
WDYT?
For the future: should really look at "device level filesystem transfer" possibility.
FWIW running now
find /mnt/legacy-efs -type d -name '*.zarr' -prune -o -type d -name 'nwb-cache' | tee /tmp/nwb-caches.log
and I wish there was cheapish way to find all sparse files, but I am afraid it might be a bit heavy, although even find can report sparseness so crude example could be
[ec2-user@ip-10-1-2-62 efs]$ ( cd /mnt/legacy-efs/home/rly/ ; find . -type f -printf '%p: %S\n' ) | grep ': 0\.[0-9]*$'
./dandi-notebooks/tutorials/cosyne_2023/nwb-cache/1336d2b3ea1474e9217330275511283c9f2197ac714be026f30e84589428938d: 0.009927
./dandi-notebooks/tutorials/cosyne_2023/nwb-cache/dbbed295182242d37398c451d612386ed9aca59978a88f95338fdb343819a2a7: 0.00387646
./dandi-notebooks/tutorials/cosyne_2023/nwb-cache/92161f095431a20a952d1f7f144fa51e65299ee407023876346ec901bcaf058f: 0.170661
./dandi-notebooks/tutorials/cosyne_2023/nwb-cache/bb197b7cc1074b666c92f1505d370f272891869234fcd9e527678ea12078b446: 0.00737148
./dandi-notebooks/tutorials/cosyne_2023/nwb-cache/eb731d5c786e2c57f64fbc36247d900dbe7971fa15eef6cc850cdf831e635278: 0.00289249
./dandi-notebooks/tutorials/cosyne_2023/nwb-cache/e4938c5b4770637a98caf9e4efda860864293574e14a37e2e5666536716f1a06: 0.000825281
./dandi-notebooks/tutorials/cosyne_2023/nwb-cache/aad9874d99089db033ca649d36340dfaea63f929adb685ff3ac4d468a6144b6d: 0.00754308
./dandi-notebooks/tutorials/cosyne_2023/nwb-cache/730d8f45e188bb778c96b949ea35e50bbbd103176e0ebf25b932f3f348626043: 0.00511873
Sparse files can show different sizes depending on file-system internals. using --apparent-size
can mitigate.
[root@ip-10-1-2-62 migration-scripts]# du --apparent-size -sh /mnt/efs/shared/catalystneuro/ephy_testing_data/
901M /mnt/efs/shared/catalystneuro/ephy_testing_data/
[root@ip-10-1-2-62 migration-scripts]# du --apparent-size -sh /mnt/legacy-efs/shared/catalystneuro/ephy_testing_data/
901M /mnt/legacy-efs/shared/catalystneuro/ephy_testing_data/
@satra do you see ability to turn on hourly accounting in
I have sent an email to MIT IS&T requesting this feature.
There is an additional cost for hourly granularity. See AWS Cost Explorer Hourly Granularity docs. MIT IS&T can enable this feature if we would like.
Spherical cow math sloppily typed on my phone after skimming @kabilar's link (someone double check please)
50 objects 24 hours 14 days (weird to query for the same data we have already retrieved every hour for 2 weeks but ok)
comes to $0.005/day or $0.16/month
I say let's risk it
Cost Explorer in the console updates 1/day which isn't very helpful when doing something new that might be expensive.
IIUC API users can get updates 1/hour but probably has its own cost too.