dandi / dandi-hub

Infrastructure and code for the dandihub
https://hub.dandiarchive.org
Other
11 stars 23 forks source link

Investigate Cost Explorer API #176

Open asmacdo opened 4 months ago

asmacdo commented 4 months ago

Cost Explorer in the console updates 1/day which isn't very helpful when doing something new that might be expensive.

IIUC API users can get updates 1/hour but probably has its own cost too.

yarikoptic commented 4 months ago

@satra do you see ability to turn on hourly accounting in

image

it seems with current approach to data transfer of the HOMEs we are running "hot" in terms of :dollar:

satra commented 4 months ago

can't seem to get hourly stuff activated with the account we have. let's hope it goes down after the transfer is done. but new efs seems to have more data than old efs :)

yarikoptic commented 4 months ago

the consumption might go down slightly.

asmacdo commented 4 months ago

The new efs with more data than old is (probably) caused by an error in my script (I missed a trailing slash)

But we've corrected it and hopefully we will be back == tomorrow

yarikoptic commented 4 months ago

I think I nailed one of the major contributors to discrepancy -- sparsely allocated files for nwb caches. E.g. typically like

[ec2-user@ip-10-1-2-62 efs]$ ls -ld /mnt/{legacy-,}efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/*
-rw-r--r--. 1 ec2-user users  4966709395 Mar  9  2023 /mnt/efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/2fdedac0b7dc61ef8b3e36dfaf0cc00f5475eb0ffe0ebcefb9bcb95a09dbbf3e
-rw-r--r--. 1 ec2-user users 46367374013 Mar  9  2023 /mnt/efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/378e4baa2dd5db3035bffa4d261efc9a76c2fd400bbb51bf8d245f687cf15b6a
-rw-------. 1 ec2-user users         546 Mar  9  2023 /mnt/efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/cache
-rw-r--r--. 1 ec2-user users  4966709395 Mar  9  2023 /mnt/legacy-efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/2fdedac0b7dc61ef8b3e36dfaf0cc00f5475eb0ffe0ebcefb9bcb95a09dbbf3e
-rw-r--r--. 1 ec2-user users 46367374013 Mar  9  2023 /mnt/legacy-efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/378e4baa2dd5db3035bffa4d261efc9a76c2fd400bbb51bf8d245f687cf15b6a
-rw-------. 1 ec2-user users         546 Mar  9  2023 /mnt/legacy-efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/cache
[ec2-user@ip-10-1-2-62 efs]$ du -scm /mnt/{legacy-,}efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/*
48      /mnt/legacy-efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/2fdedac0b7dc61ef8b3e36dfaf0cc00f5475eb0ffe0ebcefb9bcb95a09dbbf3e
400     /mnt/legacy-efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/378e4baa2dd5db3035bffa4d261efc9a76c2fd400bbb51bf8d245f687cf15b6a
1       /mnt/legacy-efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/cache
4737    /mnt/efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/2fdedac0b7dc61ef8b3e36dfaf0cc00f5475eb0ffe0ebcefb9bcb95a09dbbf3e
44220   /mnt/efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/378e4baa2dd5db3035bffa4d261efc9a76c2fd400bbb51bf8d245f687cf15b6a
1       /mnt/efs/home/ayuno-2dn/dandi-notebooks/tutorials/cosyne_2023/nwb-cache/cache
49405   total

but in some cases might be x1000 factor! Looking across similarly placed locations for nwb-cache we get

[ec2-user@ip-10-1-2-62 efs]$ du -scm /mnt/legacy-efs/home/*/dandi-notebooks/*/*/nwb-cache/ > /tmp/du-scm-nwb-cache-legacy.out
[ec2-user@ip-10-1-2-62 efs]$ du -scm /mnt/efs/home/*/dandi-notebooks/*/*/nwb-cache/ > /tmp/du-scm-nwb-cache-new.out
[ec2-user@ip-10-1-2-62 efs]$ tail -n 1 /tmp/du-scm-nwb-cache*out
==> /tmp/du-scm-nwb-cache-legacy.out <==
9233    total

==> /tmp/du-scm-nwb-cache-new.out <==
1937116 total

so about 2TB of waste (not quite full account but there might be more locations I guess). But overall with rsync we indeed might have caused that x2 of unnecessary transfer by de-sparsifying similar files. Caches of fsspec also might have similar curse.

IMHO we should remove all ~/.cache, __pycache__ and nwb-cache folders we find in the target new efs location. But first also identify where else could similar gotchas has happened.

WDYT?

For the future: should really look at "device level filesystem transfer" possibility.

yarikoptic commented 4 months ago

FWIW running now

find /mnt/legacy-efs -type d -name '*.zarr' -prune -o -type d -name 'nwb-cache'  | tee /tmp/nwb-caches.log

and I wish there was cheapish way to find all sparse files, but I am afraid it might be a bit heavy, although even find can report sparseness so crude example could be

[ec2-user@ip-10-1-2-62 efs]$ ( cd /mnt/legacy-efs/home/rly/ ; find . -type f -printf '%p: %S\n' ) | grep ': 0\.[0-9]*$'
./dandi-notebooks/tutorials/cosyne_2023/nwb-cache/1336d2b3ea1474e9217330275511283c9f2197ac714be026f30e84589428938d: 0.009927
./dandi-notebooks/tutorials/cosyne_2023/nwb-cache/dbbed295182242d37398c451d612386ed9aca59978a88f95338fdb343819a2a7: 0.00387646
./dandi-notebooks/tutorials/cosyne_2023/nwb-cache/92161f095431a20a952d1f7f144fa51e65299ee407023876346ec901bcaf058f: 0.170661
./dandi-notebooks/tutorials/cosyne_2023/nwb-cache/bb197b7cc1074b666c92f1505d370f272891869234fcd9e527678ea12078b446: 0.00737148
./dandi-notebooks/tutorials/cosyne_2023/nwb-cache/eb731d5c786e2c57f64fbc36247d900dbe7971fa15eef6cc850cdf831e635278: 0.00289249
./dandi-notebooks/tutorials/cosyne_2023/nwb-cache/e4938c5b4770637a98caf9e4efda860864293574e14a37e2e5666536716f1a06: 0.000825281
./dandi-notebooks/tutorials/cosyne_2023/nwb-cache/aad9874d99089db033ca649d36340dfaea63f929adb685ff3ac4d468a6144b6d: 0.00754308
./dandi-notebooks/tutorials/cosyne_2023/nwb-cache/730d8f45e188bb778c96b949ea35e50bbbd103176e0ebf25b932f3f348626043: 0.00511873
asmacdo commented 4 months ago

Sparse files can show different sizes depending on file-system internals. using --apparent-size can mitigate.

[root@ip-10-1-2-62 migration-scripts]# du --apparent-size -sh /mnt/efs/shared/catalystneuro/ephy_testing_data/
901M    /mnt/efs/shared/catalystneuro/ephy_testing_data/
[root@ip-10-1-2-62 migration-scripts]# du --apparent-size -sh /mnt/legacy-efs/shared/catalystneuro/ephy_testing_data/
901M    /mnt/legacy-efs/shared/catalystneuro/ephy_testing_data/
kabilar commented 3 months ago

@satra do you see ability to turn on hourly accounting in

I have sent an email to MIT IS&T requesting this feature.

kabilar commented 3 months ago

There is an additional cost for hourly granularity. See AWS Cost Explorer Hourly Granularity docs. MIT IS&T can enable this feature if we would like.

asmacdo commented 3 months ago

Spherical cow math sloppily typed on my phone after skimming @kabilar's link (someone double check please)

50 objects 24 hours 14 days (weird to query for the same data we have already retrieved every hour for 2 weeks but ok)

comes to $0.005/day or $0.16/month

I say let's risk it