astropy / pyvo

An Astropy affiliated package providing access to remote data and services of the Virtual Observatory (VO) using Python.
https://pyvo.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
74 stars 50 forks source link

BUG: SIA2 services from the registry are not searchable. #450

Closed bsipocz closed 7 months ago

bsipocz commented 1 year ago

It was confirmed that this misbehaviour shows up in several heasarch SIA2 services so it's not an irsa related problem

import pyvo

from astropy.coordinates import SkyCoord
coords = SkyCoord('150.01d 2.2d', frame='icrs')

image_services = pyvo.regsearch(servicetype='sia2')
irsa_seip = [s for s in image_services if 'irsa' in s.ivoid and 'seip' in s.ivoid][0]

seip_results = irsa_seip.search((coords, 0.01))
---------------------------------------------------------------------------
E10                                       Traceback (most recent call last)
File ~/munka/devel/pyvo/pyvo/registry/regtap.py:747, in RegistryResource.search(self, *args, **keys)
    746 try:
--> 747     return self.service.search(*args, **keys)
    748 except ValueError:
    749     # I blindly assume the ValueError comes out of get_interface.
    750     # But then that's likely enough.

File ~/munka/devel/pyvo/pyvo/registry/regtap.py:715, in RegistryResource.service(self)
    714     return self._service
--> 715 self._service = self.get_service(None, True)
    716 return self._service

File ~/munka/devel/pyvo/pyvo/registry/regtap.py:701, in RegistryResource.get_service(self, service_type, lax)
    657 """
    658 return an appropriate DALService subclass for this resource that
    659 can be used to search the resource using service_type.
   (...)
    698     opening a web browser on the access URL.
    699 """
    700 return self.get_interface(service_type, lax, std_only=True
--> 701                           ).to_service()

File ~/munka/devel/pyvo/pyvo/registry/regtap.py:344, in Interface.to_service(self)
    341     raise ValueError("PyVO has no support for interfaces with"
    342                      f" standard id {self.standard_id}.")
--> 344 return service_class(self.access_url)

File ~/munka/devel/pyvo/pyvo/dal/sia2.py:182, in SIA2Service.__init__(self, baseurl, session)
    181 self.query_ep = None  # service query end point
--> 182 for cap in self.capabilities:
    183     # assumes that the access URL is the same regardless of the
    184     # authentication method except BasicAA which is not supported
    185     # in pyvo. So pick any access url as long as it's not
    186     if cap.standardid.lower() == SIA2_STANDARD_ID.lower():

File ~/munka/devel/astropy/astropy/utils/decorators.py:837, in lazyproperty.__get__(self, obj, owner)
    836 if val is _NotFound:
--> 837     val = self.fget(obj)
    838     obj_dict[self._key] = val

File ~/munka/devel/pyvo/pyvo/dal/vosi.py:96, in CapabilityMixin.capabilities(self)
     94 @lazyproperty
     95 def capabilities(self):
---> 96     return vosi.parse_capabilities(self._capabilities().read)

File ~/munka/devel/pyvo/pyvo/io/vosi/endpoint.py:137, in parse_capabilities(source, pedantic, filename, _debug_python_based_parser)
    132 with iterparser.get_xml_iterator(
    133     source,
    134     _debug_python_based_parser=_debug_python_based_parser
    135 ) as iterator:
    136     return CapabilitiesFile(
--> 137         config=config, pos=(1, 1)).parse(iterator, config)

File ~/munka/devel/pyvo/pyvo/io/vosi/endpoint.py:346, in CapabilitiesFile.parse(self, iterator, config)
    345     else:
--> 346         vo_raise(E10, config=config, pos=pos)
    348 super().parse(iterator, config)

File ~/munka/devel/astropy/astropy/io/votable/exceptions.py:124, in vo_raise(exception_class, args, config, pos)
    123     config = {}
--> 124 raise exception_class(args, config, pos)

E10: None:3:15: E10: File does not appear to be a VOSICapabilities file

During handling of the above exception, another exception occurred:

DALServiceError                           Traceback (most recent call last)
Cell In [1], line 9
      6 image_services = pyvo.regsearch(servicetype='sia2')
      7 irsa_seip = [s for s in image_services if 'irsa' in s.ivoid and 'seip' in s.ivoid][0]
----> 9 seip_results = irsa_seip.search(coords)

File ~/munka/devel/pyvo/pyvo/registry/regtap.py:751, in RegistryResource.search(self, *args, **keys)
    747     return self.service.search(*args, **keys)
    748 except ValueError:
    749     # I blindly assume the ValueError comes out of get_interface.
    750     # But then that's likely enough.
--> 751     raise dalq.DALServiceError(
    752         f"Resource {self.ivoid} is not a searchable service")

DALServiceError: Resource ivo://irsa.ipac/spitzer/images/seip is not a searchable service

If this is of any help

In [4]: irsa_seip
Out[4]: ('ivo://irsa.ipac/spitzer/images/seip', 'vs:catalogservice', 'SEIP', 'Spitzer Enhanced Imaging Products', 'research', 'The Spitzer Science Center and IRSA have released a set of Enhanced Imaging Products (SEIP) from the Spitzer Heritage Archive. These include Super Mosaics (combining data from multiple programs where appropriate) and a Source List of photometry for compact sources. The primary requirement on the Source List is very high reliability -- with areal coverage, completeness, and limiting depth being secondary considerations. The SEIP include data from the four channels of IRAC (3.6, 4.5, 5.8, 8 microns) and the 24 micron channel of MIPS. The full set of products for the Spitzer cryogenic mission includes around 42 million sources.', 'https://irsa.ipac.caltech.edu/data/SPITZER/Enhanced/SEIP', 'SSC and IRSA', 'survey#archive', '', '', nan, 'infrared', ['https://irsa.ipac.caltech.edu/SIA?COLLECTION=spitzer_seip&'], ['ivo://ivoa.net/std/sia2'], ['vs:paramhttp'], ['std'])
msdemlei commented 1 year ago

So... while I give you that I probably should not unconditionally catch ValueError-s n RegistryResource.search (but please note that I'd like to get rid of that method anyway; we should teach people to use get_service explicitly, as even now most records already have multiple capabilities), this is problem is unrelated to registry operations (phewy!).

On Thu, Jun 15, 2023 at 11:11:17PM +0000, Brigitta Sipőcz wrote:

import pyvo

from astropy.coordinates import SkyCoord coords = SkyCoord('150.01d 2.2d', frame='icrs')

image_services = pyvo.regsearch(servicetype='sia2') irsa_seip = [s for s in image_services if 'irsa' in s.ivoid and 'seip' in s.ivoid][0]

seip_results = irsa_seip.search(coords) [...] E10: None:3:15: E10: File does not appear to be a VOSICapabilities file

A shorter reproducer is:

import pyvo

svc = pyvo.dal.SIA2Service( "https://irsa.ipac.caltech.edu/SIA?COLLECTION=spitzer_seip&")

At that point we go down a rabbit hole (does someone reading this feel responsible for the SIA2 code? This would be the moment for the people who discarded by objections to SIA2 to come forward...).

SIA2Service, because of the way we had for a while planned to do authentication, iterates through the service's VOSI capabilities on construction (sia2.py#182 ff). It does not catch exceptions when doing that, although all that only become relevant with auth. When there's something wrong with the capabilities, the whole thing blows up.

For robustness with the overwhelming majority of SIA2 services (that don't require auth) I'd say there should be some exception catching (and probably downgrading to warnings) going on here. SIA2 itself works just perfectly well without ever looking at VOSI capabilities.

Regrettably, the rabbit hole goes deeper. Consider EndpointMixin in pyvo.dal.vosi, which has a _get_endpoint method trying the following URIs when trying to find VOSI endpoints:

 '{baseurl}/{endpoint}'.format(baseurl=self.baseurl,
                               endpoint=endpoint),
 url_sibling(self.baseurl, endpoint)

I'm impressed it's that diligent, I have to say (kudos to Adrian).

Anyway, in this case, where we have access_url-embedded URL parameters, that works out to

'https://irsa.ipac.caltech.edu/SIA?COLLECTION=spitzer_seip&/capabilities', 'https://irsa.ipac.caltech.edu/capabilities?COLLECTION=spitzer_seip&'

The first clearly is absurd. The second URI could, return the capabilities requested, but at least right now it's a 404, and I frankly would object to site-global capabilities somehow influenced by URL parameters (though I don't think we're forbidding that anyhwere).

The capabilities actually are at

https://irsa.ipac.caltech.edu/SIA/capabilities?COLLECTION=spitzer_seip&

which we might add as another guess, but... I have a bad feeling about that. There aren't any clear rules right now for how to find capabilities from an access URL outside of TAP, and

$ python -c "import this" | grep temptation In the face of ambiguity, refuse the temptation to guess.

So... here's what I think we should do:

(a1) downgrade the failure on bad capabilities to a warning, or, IMHO better,

(a2) remove the capabilities thing entriely from the constructor of SIA2Service if CADC (who I think are behind that experiment) agrees that auth negotiation won't happen like that any more.

(b) It would be wonderful if IRSA could stop relying on on built-in parameters in access URLs. They are trouble, although admittedly not outright illegal. And it's really nice if we can guess capability URIs by saying "try the capabilities sibling of the access URL" and it works.

(c) The "right" way to figure out capability URIs for us would be to run a RegTAP query along the lines of

select access_url from rr.capability natural join rr.interface where standard_id='...vosi#availability' and ivoid=

(I mention in passing that because for that particular service, IRSA does not register their VOSI capabilities, it still would have failed, but that then is really IRSA's fault).

We might put that RegTAP query into the VOSI mixin to use when all else fails, but at least it shouldn't be called by default, perhaps only when all other guesses have failed (and given that nothing but TAP really makes good use of VOSI caps, I wouldn't even bother with that).

For one, I'm not so wild about having one RegTAP query per pyVO service constructed worldwide. But perhaps more importantly, it would make an ivoid (and hence a registry entry) effectively mandatory for our service objects. While I'm a big fan of nudging people to register, I'd say that'd be going to far. Our service objects should keep on being constructable with just an access URL, no?

bsipocz commented 1 year ago

(I mention in passing that because for that particular service, IRSA does not register their VOSI capabilities, it still would have failed, but that then is really IRSA's fault).

@zoghbi-a sees the same issue for multiple heasarc services, so if multiple archives faulting on this, this maybe a shortcoming of the requirements. I don't know, haven't dived into the exact details but was surprised that things don't work (==> we need to make sure one of each service type is being tested in pyvo, knowing that something doesn't work is better than surprises I suppose)

zoghbi-a commented 1 year ago

I can confirm that this is present in many services. This an example code:

ivoid = 'ivo://astron.nl/__system__/siap2/sitewide'
srv = pyvo.regsearch(ivoid=ivoid)[0]
srv.get_service('sia2')

These are examples of services producing the same error

'ivo://astron.nl/__system__/siap2/sitewide'
'ivo://au.csiro/casda/sia2'
'ivo://nasa.heasarc/suzamaster'
...

For the irsa case, if I put the capabilities url by hand (https://irsa.ipac.caltech.edu/SIA/capabilities?COLLECTION=spitzer_seip&) instead of the one being constructed from the url (https://irsa.ipac.caltech.edu/SIA?COLLECTION=spitzer_seip&/capabilities), the issue disappears.

bsipocz commented 1 year ago

Do we know whether this is an issue for any other tools out there?

If the capabilities are in fact not required for a service to be registered (a very widespread nature of this issue suggests this, too), then it look to me that (a2) could be a way ahead or within pyvo to insert it into the URL as Adbu suggests above. Raising a user-facing warning at this point I don't think is useful. I'm sure archives would be willing to do to register the capabilities if that would be strictly required. If it's not a requirement, then OTOH, pyvo should just work ™️

msdemlei commented 1 year ago

On Fri, Jun 16, 2023 at 12:05:04PM -0700, Brigitta Sipőcz wrote:

Do we know whether this is an issue for any other tools out there?

Probably not. I think pyvo is afflicted mainly because it was a test bed for CADC's auth experiments.

If the capabilities are in fact not required for a service to be registered (a very widespread nature of this issue suggests this,

Ah, it's one of these silly pieces of standardese where people thought something should be required but then actually didn't need it, which is why in the end it doesn't even work. In this case, there is a formal requirement to have capabilities but no recipe just where it should be.

Given that:

too), then it look to me that (a2) could be a way ahead or within pyvo to insert it into the URL as Adbu suggests above.

Adrian, do we have green light for removing the capabilities experiment from SIA2 again? Or if you still need it, can we perhaps hide it from the general public by some @prototype decorator?

msdemlei commented 1 year ago

On Fri, Jun 16, 2023 at 11:54:47AM -0700, Abdu Zoghbi wrote:

I can confirm that this is present in many services. This an example code: ivoid = 'ivo://astron.nl/system/siap2/sitewide' srv = pyvo.regsearch(ivoid=ivoid)[0] srv.get_service('sia2')

Uh. That's an interesting one. For that, pyvo actually could find the right capability URI if it tried the

url_sibling(self.baseurl, endpoint)

from line 28 of pyvo.dal.vosi first. As things are, it tries

'{baseurl}/{endpoint}'

first, which in this case (and according to the old requirement of ending GET-table URIs with a question mark or an ampersand depending) works out to https://vo.astron.nl/__system__/siap2/sitewide/siap2.xml?/capabilities.

That's a SIA2 query, albeit a somewhat odd one. Since SIA2 I think does not say anything about rejecting unknown parameters, the underlying software (ahem) follows ancient (and bad) VO practice of just ignoring them. So, what comes back is a VOTable with a lot of matches, and parsing that as a capability document fails.

I would seem to me that regardless what we do with SIA2Service we need to re-think the capability URL guessing here (and perhaps talk to GWS and DAL to get a clear statement out of them how they think we should be doing this). Blindly attaching "/capabilities" to an access URL that will very often end in a question mark or even an ampersand certainly is not a good idea. Also, by DALI rules I'd strongly suggest we should try the sibling first.

bsipocz commented 1 year ago

cc @andamian

andamian commented 1 year ago

Sorry for the late reply.

It looks to me that irsa_seip.search() is trying to instantiate the SIA2Service with an access URL instead of a base URL. If that's the case, we could add an additional argument to the constructor to skip parsing the capabilities when that argument is present. Or, if we teach our users to get the access urls directly from the registries then we can ignore the capabilities endpoint completely and deprecate baseulr in the constructor in favour of access_url. This model works fine for SIA2 (at least for now) but could be ambiguous for services with multiple endpoints.

As for the authentication, there was a time when we, at the CADC, had to have a different access points for Basic Authentication and the rest of auth methods. It's not the case anymore since we've stopped supporting that which means that direct access URLs should be more manageable now. As a side note, the entire pyvo.auth module is right now based on the capabilities endpoint as it aims at mapping the supported authentication mechanisms to the appropriate access URLs. It's probably due for an update although there no agreed solution right now.

msdemlei commented 1 year ago

On Mon, Jun 26, 2023 at 11:59:54AM -0700, Adrian wrote:

It looks to me that irsa_seip.search() is trying to instantiate the SIA2Service with an access URL instead of a base URL. If that's the case, we could add an additional argument to the constructor to skip parsing the capabilities when that argument is present. Or, if we teach our users to get the access urls directly from the registries then we can ignore the capabilities endpoint

The access URL does come directly from the registry here -- and clearly everyone registers "the" {query} endpoint's URL as access URL (I frankly didn't even pause to think about it, but that may be professional blindness -- it's simply the say all the other S-services work).

completely and deprecate baseulr in the constructor in favour of access_url. This model works fine for SIA2 (at least for now) but could be ambiguous for services with multiple endpoints.

Ah. I frankly hadn't noticed that SIA2Service wants to be constructed with a base URL, either (see above on professional blindness)...

Again, all the other XService-s are constructed with what's in the Registry as access_url (which in TAP happens to be a, well, base URL). It wodld hence be lovely if I didn't have to special-case SIA2, and so I'd really like a VanillaSIA2Service (say) class that does take an access URL in the constructor for registry use.

Since Sect 2. of SIA2 says:

All {query} resources must be siblings of the VOSI-capabilities resource; this limitation enables a client with just the URL for an SIA {query} resource (e.g. from a Datalink service descriptor) to find the VOSI-capabilities...

VanillaSIA2Service could just strip a segment of the access URL and then up-call to the current SIA2Service, but due to the vagaries of endpoint guessing I'd rather not do that.

Instead, I think the ideal implementation would be to have VanillaSIA2Service to accept the access_url and not touch capabilities, wheres SIA2Service keeps its current interface but becomes basically a factory that parses the capabilities to find the/a access URL and then constructs VanillaSIA2Service.

be more manageable now. As a side note, the entire pyvo.auth module is right now based on the capabilities endpoint as it aims at mapping the supported authentication mechanisms to the appropriate access URLs. It's probably due for an update although there no agreed solution right now.

Yeah -- I think this whole thing is another indication that it's nice if we can do without mandatory parsing of capabilities... Aw, the cursed auth thing...

bsipocz commented 1 year ago

@andamian - do you have time on the fix for this or shall I? I would really like to have a new bugfix version out by the end of the month that fixes all these smallish SIA2 related bugs (basically all the bugs that we found except the 'image' regsearch, though if we have a solution/workaround in for that, too, that would be the most wonderful)

bsipocz commented 11 months ago

OK, so this is the only blocker remaining for 1.4.2, so I added the milestone.

msdemlei commented 11 months ago

On Wed, Aug 02, 2023 at 03:57:50PM -0700, Brigitta Sipőcz wrote:

OK, so this is the only blocker remaining for 1.4.2, so I added the milestone.

Slightly related to this SIA2 business is also https://github.com/astropy/pyvo/pull/449 -- and I think we should really tell people that the shouldn't use image and spectrum as servicetypes before there's more code using that.

Since it won't break anything: Can't it go into 1.4.2, too?

bsipocz commented 11 months ago

This bug is when one does a 'sia2' servicetype registry search. So it's more than annoyance, basically renders the registry unusable for SIA2.

As for the for the 'image' and 'spectrum' deprecation, that should never go in a bugfix release. I see where the suggestion is coming from, but as I see most people think they are needed, users are not expected to use ssa, sia, etc when they search, and until there is a clearly good alternative IMO we shouldn't deprecate. Besides, we have a lot of edu material that use the aliases so first those needs to be updated, too.

msdemlei commented 11 months ago

On Thu, Aug 03, 2023 at 04:54:46PM -0700, Brigitta Sipőcz wrote:

As for the for the 'image' and 'spectrum' deprecation, that should never go in a bugfix release. I see where the suggestion is coming from, but as I see most people think they are needed, users are not expected to use ssa, sia, etc when they search, and until there is a clearly good alternative IMO we shouldn't deprecate. Besides, we have a lot of edu material that use the aliases so first those needs to be updated, too.

The problem is: they don't do what people (can reasonably) expect them to do, and I don't see a (reasonable, let alone backwards-compatible) way to make them do what people can expect them to do. They are just misfeatures introduced before we realised what the introduction of obscore actually means -- and before we had experience with a major version step.

Would you be less disinclined to merge the PR if we didn't raise the Warnings but just deprecated in prose? You see, it'd be great if we at least did something to not accumulate more code out there that uses these misfeatures. And sure, it'd be great if we could update educational material that's out there to not show them any more – can you point me at some so I can start bugging authors?

(and apologies for hijacking this issue; I suppose we should continue this discussion over at #449)

bsipocz commented 10 months ago

I got bogged down in the weeds of trying to fix this, but at this point it's not worth using is as a blocker for the other fixes. So, I'll go ahead and release 1.4.2, and bump this into the next bugfix milestone.

msdemlei commented 8 months ago

I ran into this again, and I feel we should really fix it; it's an embarrassment if SIA2 discovery doesn't work. And since what's registered is access_url-s rather than TAP-like "base URLs", we'll need a constructor accepting these.

@andamian, do you have a plan for that? Or should I come up with a PR myself?

bsipocz commented 8 months ago

I second that this should be fixed sooner rather than later as in fact the IRSA SIA is not at all usable from the registry atm.

I tried to come up with a PR myself, but it would have required to nuke a lot of the current code and I didn't want to do that without input from Adrian.

andamian commented 8 months ago

I'm sorry, I've totally dropped the ball on this. Unfortunately, I'm swamped with other work at the moment and don't have time to do this PR right now - maybe by next week.

@msdemlei solution VanillaSIA2Service should work but my preference is to make the SIA2Service class more resilient to that URL. If the spec requires query and capabilities end points to be siblings, it could be easy to search for capabilities either in the current path or the parent to figure out the context.

msdemlei commented 8 months ago

On Wed, Oct 25, 2023 at 03:34:12PM -0700, Adrian wrote:

that URL. If the spec requires query and capabilities end points to be siblings, it could be easy to search for capabilities either in the current path or the parent to figure out the context.

Mhh, I'm still staunchly in favour of not attempting to parse capabilities unless we really need them; it's just one extra thing that can break. On top, since the capabilities endpoint generally has no function for SIA2, it will break often (Markus' Interoperability Rule 20: "If it's not used by clients, it's broken").

The way our current SIA2Service works, that's not possible because the query endpoint's name is unpredictable. So, I really think we should have a separate service object with all the actual query functionality that, in analogy with the other S-services, is constructed just with the access URL (consistency is always nice). The current SIA2Service would then just be a factory function fiddling out the access url from the capabilities and then constructing the "plain" service object.

The NormalSIA2Service then wouldn't have to bother with capabilities at all, which I think is rather gratifying.

Oh, and: does CADC plan on using capabilities for auth negotiation in the future?

andamian commented 8 months ago

I've had a quick look at the code and I'm trying to determine exactly what the issue is here. The SIA2Service class states that it can take a base url or an access point url. Both SIA2Service('https://ws.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/sia/v2query') and SIA2Service('https://ws.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/sia') work for me. In fact, I can put any sibling (such as the SIA query end point) and it still works. It doesn't work with v2query? but that could be fixed if we consider that access URLs ending in ? are valid.

@msdemlei , if I understand correctly, what you are proposing is a class that ignores capabilities. But the capabilities end point is required in SIA2 so to provide code that circumvents our own standards doesn't seem right to me. If you think that capabilities have no use in SIA2 (like it's the case I think for DataLink) then the correct way to address this is to propose a change of the standard. My opinion is that this code should follow the standards.

The capabilities end point is actually used at the CADC. We have one SIA service and the capabilities advertise the access urls for V1 and V2 of the standard. We still advertise the supported auth methods there since there's currently no viable alternative.

This is my rather limited view from the service perspective. I'm aware that I lack any expertise with the registry and don't know how things look from there. I hope to fill in some gaps at the registry sessions at the upcoming Interop.

msdemlei commented 8 months ago

On Thu, Oct 26, 2023 at 10:29:01AM -0700, Adrian wrote:

@msdemlei , if I understand correctly, what you are proposing is a class that ignores capabilities. But the capabilities end point is required in SIA2 so to provide code that circumvents our own standards doesn't seem right to me. If you think that

Well... It's just that in the normal workflow, SIA2 has no need to look at capabilities, and so I'd prefer to leave it alone, if only in the spirit of the golden rule of interoperability ("be strict in what you produce but lenient in what you expect"; we're not a validator, after all, and the SIA2 validators that exist don't look at capabilities either (yet) for all I know, so we have to expect them to be broken in general for SIA2). That we save an HTTP request if we don't use the capabilities is a welcome side effect.

capabilities have no use in SIA2 (like it's the case I think for DataLink) then the correct way to address this is to propose a

Well, that's another can of worms that nobody really wants to open; see http://ivoa.net/documents/caproles/ -- and that's another reason not to rely on capabilities without a strong reason.

change of the standard. My opinion is that this code should follow the standards.

Sure; but "following a standard" doesn't mean we should require more of services than we need to get our job done.

The capabilities end point is actually used at the CADC. We have one SIA service and the capabilities advertise the access urls for V1 and V2 of the standard. We still advertise the supported auth methods there since there's currently no viable alternative.

Well, SIA2Service doesn't need the V1/V2 dispatch; with VO integration, you make that choice at the discovery stage, and by the time SIA2Service is created, that choice is over.

So... what's the workflow on auth integration? Can we perhaps only look at the capabilities if people have passed in some credentials? Or when the inquire whether auth is possible and if so, which?

andamian commented 8 months ago

I could be persuaded to make parsing of capabilities optional in SIA2Service class, i.e. if the provided URL is a sibling of capabilities, just assume is the correct query end point but making a separate class for that adds unnecessary confusion if this is part of the public API. Users will need to know which class to instantiate when the only difference is a small implementation detail.

At the moment, the workflow with auth integration could check the user credentials in the sessions against the ones associated with the query end point in capabilities and potentially provide a better feedback than the simple brute force 403 error. The big problem for me here is that oftentimes the credentials are obtain to works for specific authorities/domains. How can the client find out that and prevent leaks of credentials to other domains? Not such a big problem with certificates but with bearer tokens, which essentially act as short term user/passwords.

msdemlei commented 8 months ago

On Thu, Oct 26, 2023 at 12:47:00PM -0700, Adrian wrote:

I could be persuaded to make parsing of capabilities optional in SIA2Service class, i.e. if the provided URL is a sibling of capabilities, just assume is the correct query end point but

In the scenario relevant here, it's usually what the data providers gave in their registry record; if that's wrong, there is nothing we can fix (and it would stand to reason that what you get from capabilities -- essentially, an excerpt of the registry record -- is then wrong, too).

making a separate class for that adds unnecessary confusion if this is part of the public API. Users will need to know which class to instantiate when the only difference is a small implementation detail.

Yes, I give you two classes are not pretty, but then some magic heuristically probing whether what is passed in is the parent of both capabilities and {query} or whether it's the normal access URL IMHO is worse. And it's not pretty either that SIA2Service works differently from all the other S*Service classes (that are all constructed with access_url).

So, I think there's no way around some sort of uglyness.

What about a classmethod from_access_url on SIA2Service with an explanation that that's what you ought to use when you want to work SIA2Service to work like the other S*Service classes? This could then switch off capabilities parsing (at least until we have a case that needs it or until there's an interoperable auth standard that needs it).

Breaking the current behaviour and making the argument an access_url by default is probably not an option, is it?

to works for specific authorities/domains. How can the client find out that and prevent leaks of credentials to other domains? Not such a big problem with certificates but with bearer tokens, which essentially act as short term user/passwords.

I realise auth is behind the current design, but if we can separate the API question from auth considerations as much as possible, it certainly would help a lot to solve it. Auth has been an unsolved problem for so many years, and I fear if we delay fixing SIA2Service until we've figured out auth will keep SIA2Service in its broken (for my -- registry-related -- purposes) state for a long, long time.