HDFGroup / datacontainer

Data Container Study
Other
8 stars 1 forks source link

Verify Pluggable compression filters and h5py #25

Closed jreadey closed 8 years ago

jreadey commented 8 years ago

Verify h5py can read datasets that are compressed with Mafisc

hyoklee commented 8 years ago

I could run h5py with filters and read szip/mafisc/blosc-compressed data. However, it's python under anaconda/bin/ and it's version 2.7. Can iPython parallel run on 2.7 or does it require 3.4?

ubuntu@test2:~/anaconda/bin$ python --version
Python 2.7.10 :: Anaconda 2.3.0 (64-bit)
ghost commented 8 years ago

@jreadey installed Anaconda Python 3.4 & friends. Where Anaconda Python 2.7 comes from?

@hyoklee What snapshot did you use for this?

hyoklee commented 8 years ago

@ajelenak-thg I'm using issue10 snapshot. The origin of this instance is py34. I can see python3 on /usr/bin/python3 and ~/anaconda/envs/py34/bin.

jreadey commented 8 years ago

@hyoklee - it sounds like you are running the system python. If you do: $ which python you should get: /home/ubuntu/anaconda/envs/py34/bin/python for the snapshotn images.

hyoklee commented 8 years ago

@jreadey I intentionally turned off py34 from .bashrc

# activate Conda Env                                                              
# source activate py34  

before because I installed custom hdf5 library with filters under ~/anacondda/lib/.

Now I can run h5py with py34 by copying the following filers & hdf5 libraries under ~/anaconda/lib into ~/ anaconda/envs/py34/lib/:

  -rwxrwxr-x  1 ubuntu ubuntu    89933 Nov  9 22:09 libmafisc.so.1
  -rw-r--r--  1 ubuntu ubuntu   217328 Nov  9 20:35 libhdf5_hl.a
  -rwxr-xr-x  1 ubuntu ubuntu     1075 Nov  9 20:35 libhdf5_hl.la
  -rwxr-xr-x  2 ubuntu ubuntu   148174 Nov  9 20:35 libhdf5_hl.so.10.0.1
  -rw-r--r--  1 ubuntu ubuntu     3114 Nov  9 20:35 libhdf5.settings
  -rw-r--r--  1 ubuntu ubuntu  6029000 Nov  9 20:35 libhdf5.a
  -rwxr-xr-x  1 ubuntu ubuntu     1017 Nov  9 20:35 libhdf5.la
  -rwxr-xr-x  2 ubuntu ubuntu  3247608 Nov  9 20:35 libhdf5.so.10.0.1
  -rw-r--r--  1 ubuntu ubuntu    58962 Nov  9 19:55 libsz.a
  -rwxr-xr-x  1 ubuntu ubuntu      802 Nov  9 19:55 libsz.la
  -rwxr-xr-x  1 ubuntu ubuntu    52384 Nov  9 19:55 libsz.so.2.0.0
  -rwxrwxr-x  1 ubuntu ubuntu     1055 Oct 22 00:03 libzmq.la

Yet, there's a still mystery. Everything works well with py34 if I set:

export HDF5_PLUGIN_PATH=/home/ubuntu/anaconda/lib

But h5py gives segmentation fault if I set:

export HDF5_PLUGIN_PATH=/home/ubuntu/anaconda//envs/py34/lib/

I will investigate further what library causes the seg. fault.

ghost commented 8 years ago

Do we need a new snapshot or issue10 is recent enough to use?

hyoklee commented 8 years ago

Can ncep job script and run_engine use different snapshot name like "issue10" instead of "ipengine"? I can see hard-coded ipengine name.

jreadey commented 8 years ago

I'll create new snapshots once my PR is merged.

jreadey commented 8 years ago

I'll delete the old ipengine and create a new snapshot with the same name.

hyoklee commented 8 years ago

There are many differences in py34 lib and ~/anaconda/lib. blosc.so exists in py34. The below are candidates that cause segmentation fault in py34 with HDF5_PLUGIN_PATH change:

(py34)ubuntu@test2:~/anaconda/envs/py34/lib$ diff -r . ~/anaconda/lib/
Only in .: blosc.so
Only in /home/ubuntu/anaconda/lib/: cairo
Only in /home/ubuntu/anaconda/lib/cmake: libxml2
Only in ./cmake: openblas
...
Only in .: libhdf5_cpp.la
Only in .: libhdf5_cpp.so
Only in .: libhdf5_cpp.so.10
Only in .: libhdf5_cpp.so.10.0.1
Only in .: libhdf5_hl_cpp.la
Only in .: libhdf5_hl_cpp.so
Only in .: libhdf5_hl_cpp.so.10
Only in .: libhdf5_hl_cpp.so.10.0.1
...
Only in .: liblzma.a
Only in .: liblzma.la
Only in .: liblzma.so
Only in .: liblzma.so.5
Only in .: liblzma.so.5.0.5
...
Binary files ./libssl.a and /home/ubuntu/anaconda/lib/libssl.a differ
Binary files ./libssl.so and /home/ubuntu/anaconda/lib/libssl.so differ
Binary files ./libssl.so.1.0.0 and /home/ubuntu/anaconda/lib/libssl.so.1.0.\
0 differ
...
diff -r ./libzmq.la /home/ubuntu/anaconda/lib/libzmq.la
8c8
< dlname='libzmq.so.5'
---
> dlname='libzmq.so.4'
11c11
< library_names='libzmq.so.5.0.0 libzmq.so.5 libzmq.so'
---
> library_names='libzmq.so.4.0.0 libzmq.so.4 libzmq.so'

It's interesting blosc.so exists in py34.

hyoklee commented 8 years ago

@jreadey I think

source activate py34

screws up so many things. For example, I cannot run gdb.

(py34)ubuntu@test2:~$ gdb
Failed to import the site module
Traceback (most recent call last):
  File "/usr/lib/python3.4/site.py", line 586, in <module>
    main()
  File "/usr/lib/python3.4/site.py", line 572, in main
    known_paths = addusersitepackages(known_paths)
  File "/usr/lib/python3.4/site.py", line 287, in addusersitepackages
    user_site = getusersitepackages()
  File "/usr/lib/python3.4/site.py", line 263, in getusersitepackages
    user_base = getuserbase() # this will also set USER_BASE
  File "/usr/lib/python3.4/site.py", line 253, in getuserbase
    USER_BASE = get_config_var('userbase')
  File "/usr/lib/python3.4/sysconfig.py", line 602, in get_config_var
    return get_config_vars().get(name)
  File "/usr/lib/python3.4/sysconfig.py", line 545, in get_config_vars
    _init_posix(_CONFIG_VARS)
  File "/usr/lib/python3.4/sysconfig.py", line 417, in _init_posix
    from _sysconfigdata import build_time_vars
  File "/usr/lib/python3.4/_sysconfigdata.py", line 6, in <module>
    from _sysconfigdata_m import *
ImportError: No module named '_sysconfigdata_m'

If I don't soucre activate, I can run it fine.

jreadey commented 8 years ago

Why is gdb trying to do Python stuff?

You can always do: $ source deactivate If you need to drop the Python paths.

hyoklee commented 8 years ago

I don't know. I think your conda py34 setup acts weird on OSDC Ubuntu.

hyoklee commented 8 years ago

I'm giving up for making py34 work for both summary.py (zmq error in #19) and filters. The issue10 instance cannot be used as ipengine.

jreadey commented 8 years ago

Do you have a snapshot with just hdf5 & the plugin filters setup?

hyoklee commented 8 years ago

No.

However, Issue10 snapshot will work with hdf5 & plugins & h5py 2.7 / 3.4 if you set HDF5_PLUGIN_PATH=/home/ubuntu/anaconda/lib. Unfortunately, summary.py will not work due to zmq error.

hyoklee commented 8 years ago

@jreadey Is there a particular reason to use Anaconda? I'd like to try python3.4 that's built from scratch and see if it works better.

jreadey commented 8 years ago

@hyoklee - Anaconda makes it much easier to bring in packages across the fire wall. I tried building Python earlier, but it was too hard.

At this point you don't need to deal with Python - just get the compression filters working with the standard hdf tools. In Sanpshot10 I moved the hdf5 libs/tools to /home/ubuntu/hdf5 and it all seemed to work.

hyoklee commented 8 years ago

@jreadey Do you mean szip/mafisc/blosc filters working on Snapshot10?

jreadey commented 8 years ago

szip is, but not mafisc and blosc.

hyoklee commented 8 years ago

I created snapshot called "issue25". It can be used for ipengine with blosc/mafisc support. This snapshot uses all the latest release software: Python3.5, hdf5 1.8.6, etc and built from source using scratch Ubuntu.

jreadey commented 8 years ago

Cool. Does it work with h5dump and h5py?

hyoklee commented 8 years ago

Yes.