HDFGroup / hsds

Cloud-native, service based access to HDF data
https://www.hdfgroup.org/solutions/hdf-kita/
Apache License 2.0
126 stars 52 forks source link

POST_Links does not return link information when following links recursively #308

Closed mattjala closed 4 months ago

mattjala commented 4 months ago

When using POST_Links to retrieve a link by name with follow_links enabled, if the link is retrieved from a sub-group, the returned JSON is empty.

Implementing the follow_links parameter for POST_Links raises some questions - if multiple links are found by the same name, which should be returned? The first in iteration order? Do the domain crawlers guarantee any particular iteration order, and do we want to expose that behavior?

It might make sense to remove the follow_links param from POST_Links. Similarly, CreateOrder, Limit, and Marker seem to be holdovers from GET_Links and should probably be removed as well.

Test to elicit the bug:

    def testPostLinksFollowLinks(self):
        domain = self.base_domain + "/testPostLinksFollowLinks.h5"
        helper.setupDomain(domain)
        print("testPostLinksFollowLinks", domain)
        headers = helper.getRequestHeaders(domain=domain)

        req = helper.getEndpoint() + "/"
        rsp = self.session.get(req, headers=headers)
        self.assertEqual(rsp.status_code, 200)
        rspJson = json.loads(rsp.text)
        root_id = rspJson["root"]

        # create group "g1" in root group
        req = helper.getEndpoint() + "/groups"
        body = {"link": {"id": root_id, "name": "g1"}}
        rsp = self.session.post(req, data=json.dumps(body), headers=headers)
        self.assertEqual(rsp.status_code, 201)
        rspJson = json.loads(rsp.text)
        group_id = rspJson["id"]

        path = "/dummy_target"

        # create link "link1" in g1
        req = helper.getEndpoint() + "/groups/" + group_id + "/links/link1"
        body = {"h5path": path}
        rsp = self.session.put(req, data=json.dumps(body), headers=headers)
        self.assertEqual(rsp.status_code, 201)

        # make POST_Links request to root group, expect 404 
        body = {"titles": ["link1"]}
        req = helper.getEndpoint() + "/groups/" + root_id + "/links"
        rsp = self.session.post(req, data=json.dumps(body), headers=headers)
        self.assertEqual(rsp.status_code, 404)

        # make POST_Links request to root group with follow_links, expect to find link1
        req = helper.getEndpoint() + "/groups/" + root_id + "/links?follow_links=1"
        rsp = self.session.post(req, data=json.dumps(body), headers=headers)
        self.assertEqual(rsp.status_code, 200)
        rspJson = json.loads(rsp.text)
        self.assertTrue("links" in rspJson)
        links = rspJson["links"]
        self.assertTrue(len(links) == 1) # fails due to empty return
        link = links[0]
        self.assertTrue("title" in link)
        self.assertEqual(link["title"] == "link1")
mattjala commented 4 months ago

Spoke to John about this - follow_links should return all links that match the name in the provided groups. Limit and CreateOrder are used in the case where titles is not provided, so they should stay. marker isn't actually checked for by POST_Links, so no change is needed there.

mattjala commented 4 months ago

@jreadey I'm not sure I understand how this endpoint is supposed to work with the pattern parameter. testGetPattern expects group ids to be returned that don't contain any links that fit the pattern. Is it meant to return all searched group ids (empty or not) when using a pattern, and not without a pattern?

Here's the (expected) response in testGetPattern with empty groups:

{'links': 
   {'g-f81913f9-909037d3-7091-9b7118-18bf5b': [], 
   'g-f81913f9-909037d3-0b41-8c15a0-ba4fad': [], 
   'g-f81913f9-909037d3-2304-9163c1-15d503': [
      {'id': 'd-f81913f9-909037d3-cd25-bd6a92-8700e7', 'class': 'H5L_TYPE_HARD', 'created': 1707845214.4165003, 'title': 'dset2.1'}, 
      {'id': 'd-f81913f9-909037d3-be98-2bdfa6-91fa48', 'class': 'H5L_TYPE_HARD', 'created': 1707845214.4451396, 'title': 'dset2.2'}
   ],
    'g-f81913f9-909037d3-f6a8-00d8e6-4046b6': [
      {'id': 'd-f81913f9-909037d3-39d1-9c7ecc-a8b46b', 'class': 'H5L_TYPE_HARD', 'created': 1707845214.270042, 'title': 'dset1.1.1'}, 
      {'id': 'd-f81913f9-909037d3-697f-a976fe-b8c909', 'class': 'H5L_TYPE_HARD', 'created': 1707845214.2981899, 'title': 'dset1.1.2'}
   ], 
   'g-f81913f9-909037d3-6cba-d68216-f3d0b1': [], 
   'g-f81913f9-909037d3-cc63-497daf-2bf556': []
   }
}
jreadey commented 4 months ago

Yes, that is the intent. You could make the case to not return group ids that contain no links, but I like the explicit empty list.

jreadey commented 4 months ago

I tried out titles with follow_links and it doesn't return a sensible result -- links to subgroups won't get followed if the subgroup link is not in titles. Rather than fix this up, I think it makes more sense to return a 400 if titles is used with follow_links -- I've submitted a PR for this change. The PR also raises 400 if CreateOrder is used with titles or Limit is used with titles.

mattjala commented 4 months ago

Resolved in #314