HDFGroup / h5pyd

h5py distributed - Python client library for HDF Rest API
Other
114 stars 38 forks source link

change out load_file for HSLoad #94

Closed MRossol closed 4 years ago

MRossol commented 4 years ago

@jreadey Here is the code I put together to read the chunk_info it parallel. It's pretty much a complete re-factor of your code into OOP since that is what my fork looked like... Let me know if you want to walk through it during our next meeting.

jreadey commented 4 years ago

If I do a no option hsload with the tall.h5 test file, I'm getting an error: Traceback (most recent call last): File "/opt/anaconda3/envs/py38/bin/hsload", line 11, in <module> load_entry_point('h5pyd==0.8.0', 'console_scripts', 'hsload')() File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/h5pyd-0.8.0-py3.8.egg/h5pyd/_apps/hsload.py", line 380, in main HSLoad.run(fin, fout, dataload=dataload, s3path=s3path, File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/h5pyd-0.8.0-py3.8.egg/h5pyd/_apps/utillib.py", line 1232, in run load.load_file(dataload=dataload, s3_path=s3path, File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/h5pyd-0.8.0-py3.8.egg/h5pyd/_apps/utillib.py", line 1190, in load_file object_helper.load_datasets(self._h5, dataload=dataload) File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/h5pyd-0.8.0-py3.8.egg/h5pyd/_apps/utillib.py", line 876, in load_datasets self.load_datasets(dobj) File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/h5pyd-0.8.0-py3.8.egg/h5pyd/_apps/utillib.py", line 876, in load_datasets self.load_datasets(dobj) File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/h5pyd-0.8.0-py3.8.egg/h5pyd/_apps/utillib.py", line 856, in load_datasets dobj = obj[name] File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/h5py/_hl/group.py", line 264, in __getitem__ oid = h5o.open(self.id, self._e(name), lapl=self._lapl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5o.pyx", line 190, in h5py.h5o.open KeyError: "Unable to open object (unable to open external file, external link file name = 'somefile')"

I don't see this in the master braanch. Is the PR checking the location of every external link?

MRossol commented 4 years ago

@jreadey I can't find any issues with line 385:

pylint utillib.py            
************* Module utillib
utillib.py:1:0: C0302: Too many lines in module (1238/1000) (too-many-lines)
utillib.py:1:0: C0114: Missing module docstring (missing-module-docstring)
utillib.py:860:16: R1703: The if statement can be replaced with 'var = bool(test)' (simplifiable-if-statement)
utillib.py:897:8: R1703: The if statement can be replaced with 'var = bool(test)' (simplifiable-if-statement)
utillib.py:1049:8: R1703: The if statement can be replaced with 'var = bool(test)' (simplifiable-if-statement)

-----------------------------------
Your code has been rated at 9.92/10

As to the error the issue is on line 856: https://github.com/HDFGroup/h5pyd/blob/e519e979f8a6ba82640f47b96b116ac56f9d7981/h5pyd/_apps/utillib.py#L856

I had to run the parallel chunk reads outside of visititems, but what I don't understand is why visititems ran okay... see here: https://github.com/HDFGroup/h5pyd/blob/e519e979f8a6ba82640f47b96b116ac56f9d7981/h5pyd/_apps/utillib.py#L1186-L1190

There were some other random typos that I just pushed. mind trying again?