astropy / pyvo

An Astropy affiliated package providing access to remote data and services of the Virtual Observatory (VO) using Python.
https://pyvo.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
74 stars 50 forks source link

Add datalink original row #559

Closed msdemlei closed 1 week ago

msdemlei commented 3 weeks ago

This is an attempt to fix bug #542. It's not particularly pretty, but I don't think becoming much cleverer here is worth it.

I'm marking this as a draft while I'm still working out whether we should pass around a few extra original_row-s.

msdemlei commented 2 weeks ago

Sorry for another force-push, and one that reverses the history on top. On the other hand, this has now a passing and reasonably fast docs/dal/index.rst. So... if the CI passes, could you have another look, @bsipocz?

msdemlei commented 2 weeks ago

On Thu, Jun 27, 2024 at 06:26:49PM -0700, Brigitta Sipőcz wrote:

This example runs into the DALFormattingError, that is very similar to #361, thus it fails the test.

(Technically it's a warning, so we can deal with it if necessary, but it would be nicer to fix it instead, if possible)

Oh, I forgot to comment on this: I've put an IGNORE_WARNINGS there now, because while Tom has promised to do the astropy fix RSN, it will still be a while until the fix will arrive in our test systems, let alone in Debian, which is where my doctests run...

msdemlei commented 1 week ago

On Mon, Jul 01, 2024 at 04:44:32PM -0700, Brigitta Sipőcz wrote:

@bsipocz commented on this pull request.

Minor comments only, except that original_row=None doesn't feel right in clone_byid as a default, shouldn't that be a copy as well from the object?

No, I'm pretty sure we should not copy original_row despite the "clone" in the name. The reason is that clone_byid is passed an id; in other words, in all likelihood the new DatalinkResults will corresond to a different "thing", and hence, if there was an original row, to a different original row.

Now, I am fairly adamant that it is more important to avoid wrong data than to have some additional possibly correct data (see also the Zen of Python: "In the face of ambiguity, refuse the temptation to guess"). So, for reasons of defensive programming, I think I'll stubbornly insist on None here.

  • def init(self, *args, **kwargs):
  • self.original_row = kwargs.pop("original_row", None)
  • super().init(*args, **kwargs)

I don't think this is necessary as you already do the same in the DatalinkResultsMixin base class.

Ummm... sorry, I don't follow here: Are you referring to the init method? But the Mixin doesn't have a constructor (as I, frankly, would expect from a Mixin)? The upcall goes to DALResults, and that won't touch original_row.

Am I missing something fundamental?

+.. doctest:: +

No need for this, everything with >>> are picked up as doctest

Hmyeah... can we still keep it to make things simpler for my zero-tooling doctest profiler? [cf. my plea in #567]

  • res = super(DatalinkResults, cls).from_result_url(
  • result_url, session=session)

can be written in one line, maybe more readable that way

        res = super().from_result_url(result_url, session=session)

Shiver my timbers. I indeed had not realised that the argument-less super is smart enough to even work in a class method, but it apparently does. Wow. One of these days I'll try and understand how it does that :-)

Anyway: changed in commit 5360cff6.

@@ -597,7 +619,7 @@ def clone_byid(self, id): for x in copy_tb.resources: if x.ID and x.ID not in referenced_serviced: copy_tb.resources.remove(x)

  • return DatalinkResults(copy_tb)
  • return DatalinkResults(copy_tb, original_row=original_row)

shouldn't original row be a copy as well, from whatever is already set in the class, in self.original_row?

First off, it is my conviction that the wider vicinity of that code needs a thorough review anyway and do (a lot) less copying (I have bug #537 to prove it).

Your point, however, is a good observation: Do we want to comment on what's supposed to happen if someone changes original_row (regardless of whether it comes in through clone_byid or through some other means)?

Me... well, I can't think of a case where it would actually be useful if datalink client code modifies the table the datalinks come from. On the other hand, I don't think I want to copy (or materialise) all these rows just in case. So, in commit 5360cff6 I have added a bit of language to the docs to the effect that you're not supposed to rely on any behaviour when you muck around with original_row.

Thanks for the review!

bsipocz commented 1 week ago

my bad about the __init__, I clearly misread the github diff rendering and assumed L436 belonged to the baseclass. The rest are fine, thanks!

bsipocz commented 1 week ago

Thanks @msdemlei!