BrianClement / Demo2

Test repository for syncing with local VS repository
0 stars 0 forks source link

Enable running on Umu_W #58

Open BrianClement opened 7 years ago

BrianClement commented 7 years ago

[Issue originally created 2016-Oct-03 00:00:15 UTC by rgommers]

BrianClement commented 7 years ago

[Comment originally created 2016-Oct-07 04:20:36 UTC by rgommers]

In 7b488caa81 I've added some details on how to access Umu and mount the DiscBot share with HDF5 files there. At the moment nothing except Python itself and Luigi are installed though.

@hammockman I propose to install Miniconda 3.5 on Umu, and in the root env install a full set of packages we need (h5py, pytables, ipython, etc.). Any objections or better ideas?

BrianClement commented 7 years ago

[Comment originally created 2016-Oct-07 05:26:59 UTC by hammockman]

Miniconda sounds fine.

I'm still mostly working on my old box (haggis). At the rate I'm going it'll be a while till I migrate to umu.

Previously cifs mounts have been system-wide (using my credentials in /etc/fstab) and attached to /media/Q (\cifs5200\Data), /media/Z (\cifs5200\FFR) etc. to match standard windows mappings. This lets me change paths in scripts quickly/safely, but isn't pretty.

While cifs is perhaps our best (only?) option for accessing the h5 files, I started doing some performance testing last month and it was pretty dire. If you know of any good tools, particularly for measuring random access times (which is one of the promised pay-offs of HDF5) you might like to take a look also.

BrianClement commented 7 years ago

[Comment originally created 2016-Oct-11 10:07:52 UTC by rgommers]

While cifs is perhaps our best (only?) option for accessing the h5 files, I started doing some performance testing last month and it was pretty dire.

Yeah, not sure we have much choice here. NFS would be worse.

If you know of any good tools, particularly for measuring random access times (which is one of the promised pay-offs of HDF5) you might like to take a look also.

A simple %timeit should do I guess. I can have a look at whether it's much worse from a Linux box compared to a Windows one. For more systematic benchmarks I like https://spacetelescope.github.io/asv/, but it's a bit too early to think about systematic benchmarking.

BrianClement commented 7 years ago

[Comment originally created 2016-Oct-11 10:34:06 UTC by hammockman]

definitely too soon for asv! i think the key issue for hdf5 access perfomance testing is how much slower is it to extract a dataset from an hdf5 file stored on the SAN compared to extracting the same block from an HDF5 stored locally, and how does the relative speed change with dataset size. if the time required to extract small chunks of data over network >> local and if the speed ratio improves as the chunk size goes up, then we should look hard at the transport protocol. otherwise its network stuff that needs tweaking (or maybe just keep a local mirror. uuuuggghhh)

BrianClement commented 7 years ago

[Comment originally created 2016-Oct-11 10:44:41 UTC by rgommers]

if the time required to extract small chunks of data over network >> local

I expect that to be the case

if the speed ratio improves as the chunk size goes up, then we should look hard at the transport protocol.

yes that's a good point, I can measure that. hopefully not the case. the smallest chunks are 1 image I guess, which is already not that small.