HDFGroup / h5serv

Reference service implementation of the HDF5 REST API
Other
168 stars 35 forks source link

Not able to navigate links on an updated file #86

Closed ahalota closed 8 years ago

ahalota commented 8 years ago

I've uploaded the file I am experiencing this issue with here (50MB): https://drive.google.com/file/d/0Bz-XJG5KVi2MU1UxMUFhSjdDWDQ/view?usp=sharing

I haven't experienced this issue before. I am navigating this file in my web browser, but when I reach the second level of my dataset, I cannot reach the links and instead get this error:

"Traceback (most recent call last): File "C:\Users\acartas\AppData\Local\Continuum\Anaconda2\lib\site-packages\tornado\web.py", line 1443, in _execute result = method(_self.path_args, *_self.path_kwargs) File "app.py", line 215, in get items = db.getLinkItems(reqUuid, marker=marker, limit=limit) File "../hdf5-json/lib\hdf5db.py", line 2650, in getLinkItems item = self.getLinkItemByObj(parent, link_name) File "../hdf5-json/lib\hdf5db.py", line 2579, in getLinkItemByObj item['href'] = 'datasets/' + item['id'] TypeError: cannot concatenate 'str' and 'NoneType' objects"

Can't say I have any idea where this issue came from. I create this h5 file myself using the h5py library. The entries are gzip compressed. Before adding the latest item to the dataset ("emissions"), I was able to navigate the file just fine reading these datasets ("c_agri","c_sava","c_peat","c_fore","ba").

I can reach this URL just fine: http://127.0.0.1:5000/groups/fc125eb0-fce9-11e5-9c08-f01faf2b586e?host=GFED_Annual.hdfgroup.org But the error happens when I then click on "links": http://127.0.0.1:5000/groups/fc125eb0-fce9-11e5-9c08-f01faf2b586e/links?host=GFED_Annual.hdfgroup.org

ahalota commented 8 years ago

Some further testing:

I recreated my entire file, and was able to navigate all links.

The file linked to above was created, then a dataset "emissions" was added to each subgroup. The only difference I can spot here is that perhaps there is some trouble when updating a file.

jreadey commented 8 years ago

@ahalota, by any chance was the file modified (by something other than h5serv) after it was imported? That would cause problems like you've seen.

ahalota commented 8 years ago

Yes, it was! I had no idea that mattered. On Apr 13, 2016 2:58 PM, "John Readey" notifications@github.com wrote:

@ahalota https://github.com/ahalota, by any chance was the file modified (by something other than h5serv) after it was imported? That would cause problems like you've seen.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/HDFGroup/h5serv/issues/86#issuecomment-209665425

jreadey commented 8 years ago

Yes, it does! h5serv keeps track of the groups/datasets in a file by creating a special group: __db__. If this group is already present in the file, h5serv assumes it's already be catalogued and uses that group to handle requests for an object referenced by UUID. Otherwise it iterates through objects in the file and creates the db group. When a group or dataset is created or deleted through the REST api, h5serv knows to update the db group. If that updates happens outside of h5serv, the db group won't be updated of course.

Most generally, it can be dangerous to update a file in the h5serv data directory if that file is also likely to be updated by POST/PUT/DELETE requests coming through the server. If the file is updated by the server at the same time another process is updating the file, the file could be left in a corrupted state.

Anyway, if the file does get out of sync, you can just delete the db group and it will be initialized the next time a request comes in.

BTW, this is mentioned in the docs in the Note in this page: http://h5serv.readthedocs.org/en/latest/Installation/ServerSetup.html#data-files.

ahalota commented 8 years ago

Would it be possible to add a method that lets you manually fix TOC for items added outside of H5Serv interface? I'm trying to add another new dataset to my file, but it would be nice not to have to rewrite my existing code (uses h5py library) into REST commands.

I can't just delete the TOC in this case, since I am already referring to the exact ids created and don't want those to change.

jreadey commented 8 years ago

Have you seen the h5pyd library? (https://github.com/HDFGroup/h5pyd) Idea is that it is h5py compatible but translates calls to HDF REST requests. So hopefully you can just change the line: "import h5py" to "import h5pyd as h5py" and everything else should be compatible.

ahalota commented 8 years ago

Oo, that's amazing! I had no idea that was available.