Simple utypes - Githubissues

AdaNebot commented 4 years ago

Some minimal utypes in agreement with those from PhotDM are proposed in the document.

Cons: Simple utypes only work for the simplest cases.

We will need to revisit this document for more complicated cases.

AdaNebot commented 4 years ago

A proposal, of adding some useful utypes in agreement with Obscore

<PARAM ID="123" name="dataproduct_type" ucd=“meta.code.class" utype="obscore:ObsDataset.dataProductType” datatype="char" arraysize="*" unit="" value="timeseries" />  
<PARAM ID="456" name="dataproduct_subtype" ucd="meta.code.class" utype="obscore:ObsDataset.dataProductSubtype" datatype="char" arraysize="*" unit="" value="lightcurve" />

AdaNebot commented 4 years ago

I like this idea, in agreement with Obscore. Nevertheless, I'm not sure that we need the ID (and if then it should not start with a number) since there is no need of ref to it

AdaNebot commented 4 years ago

Would this be inside GROUP, TABLE, or RESOURCE?

AdaNebot commented 4 years ago

The advantage of inserting these elements in the element TABLE , is that we would be able to stock in a same VOTable document, a table for the timeseries, and for instance a second table with the response of the filter curve or something else that could be interesting to analyse the lightcurve. The GROUP would not change, but the table would look like:

<TABLE name=“mytable” > 
  <DESCRIPTION>Light curve in filter Gbp </DESCRIPTION>
  <PARAM name="dataproduct_type" ucd=“meta.code.class" utype="obscore:ObsDataset.dataProductType” datatype="char" arraysize="*"  unit="" value="timeseries" />
  <PARAM name="dataproduct_subtype" ucd="meta.code.class" utype="obscore:ObsDataset.dataProductSubtype" datatype="char" arraysize="*" unit="" value="lightcurve" />
  ...
<TABLE/>

AdaNebot commented 4 years ago

Solved by #22

AdaNebot commented 4 years ago

Apparently if these PARAMS would be in a RESOURCE it would be easier / clearer for applications.

What's your opinion ? @Zarquan @loumir

Zarquan commented 4 years ago

First question - have the IVOA ever done this before ? If yes, then we should follow the established practice.

If no, then no objection, but some details would need to be solved.

TL;DR; if dataproduct_type applies to the RESOURCE, then we need to add more details to the specification to describe which TABLE within the RESOURCE contains the timeseries data.

I liked this comment from @AdaNebot :

The advantage of inserting these elements in the element TABLE , is that we would be able to stock in a same VOTable document, a table for the timeseries, and for instance a second table with the response of the filter curve or something else that could be interesting to analyse the lightcurve.

So do we loose that ability, or can we add something to enable this to still happen ?

How does an application know which TABLE within the RESOURCE contains the time series data ?
Does this mean a set of standard names for timeseries TABLEs in the RESOURCE ?
Are we allowed other TABLEs inside the same RESOURCE as the timeseries TABLE ?
Are we allowed more than one timeseries TABLE inside the same RESOURCE, if so, how does an application find them ?

loumir commented 4 years ago

I agree that if we want to factorise the dataproduct_type to the RESOURCE we need to add the rule : A RESOURCE may contain several time series tables but of the same dataproducttype. A VOTABLE with one time series table and one filter curve table will require 2 RESOURCE elements one for each TABLE but it is valid in VOTABLE too.

Bonnarel commented 4 years ago

Hi Dave

First question - have the IVOA ever done this before ? If yes, then we should follow the established practice.

If no, then no objection, but some details would need to be solved. in VOTable section 3.4 on RESOURCE explicity allows to have PARAMs in a RESOURCE. There is an example of this in section 5.2. I am looking for a live example

TL;DR; if dataproduct_type applies to the RESOURCE, then we need to add more details to the specification to describe which TABLE within the RESOURCE contains the timeseries data.

This RESOURCE will be the TimeSeries. Tables or material which are not directly involved the TimeSeries should go in an independant RESOURCE I liked this comment from @AdaNebot :

The advantage of inserting these elements in the element TABLE , is that we would be able to stock in a same VOTable document, a table for the timeseries, and for instance a second table with the response of the filter curve or something else that could be interesting to analyse the lightcurve.

So do we loose that ability, or can we add something to enable this to still happen ?
* How does an application know which `TABLE` within the `RESOURCE` contains the time series data ?
Every TABLE in this RESOURCE is part of the TIMESeries

Does this mean a set of standard names for timeseries TABLEs in the RESOURCE ?

Are we allowed other TABLEs inside the same RESOURCE as the timeseries TABLE ?

Several Time Tables but no different one

Are we allowed more than one timeseries TABLE inside the same RESOURCE, if so, how does an application find them ?

Yes All the TABLES have to be read in that case

AdaNebot commented 4 years ago

In some cases it might be better to add those PARAM at the RESOURCE level and in others directly in the TABLE.

We could put an example to show when it is better to choose one over the other.

AdaNebot commented 4 years ago

Do you have any particular thought on this one @msdemlei ?

msdemlei commented 4 years ago

On Wed, Apr 29, 2020 at 05:23:19AM -0700, AdaNebot wrote:

Do you have any particular thought on this one @msdemlei ?

On the ID question: the PARAMs certainly do not need an ID parameter, so yes, that should go.

Whether to reference them through utype or through name does not make much of a difference from an abstract point of view, as the two are linked by obscore.

While name feels a bit less formal, I can't see much wrong with reserving parameter names in a protocol (we've long done this with INFO). Since, further, the names are less unwieldy than the utypes and there's less potentical to get things wrong (e.g., is ObsCore:ObsDataset.dataProductSubtype the same as obscore:obsdataset.dataproductsubtype? By SSAP's utype definitions, it is, but will people write their xpath expressions like this?), I'd say let's just require names and no utypes here.

If, for some reason, we want utypes, then we should skip any requirements on names, because when you define two ways to express the same thing, in the end both won't work reliably.

On the other hand, name is a mandatory attribute on PARAM in VOTable, so we'd be asking people to make up such names... hm. As I said: I'd go for names only.

As to the location: If we take this from obscore, we ought to copy obscore semantics. Obscore talks about datasets, and I'm rather convinced we should identify the obscore dataset with a complete VOTable here; I'd postpone fragment references in obscore as long as we possibly can.

Hence, it's the entire VOTable we describe. And hence, I'd say the params should be children of VOTABLE. That's also nice because then clients can parse until they hit the first <RESOURCE, and if they've not found the dataproduct_type until then they can be certain the file doesn't contain one.

Why would clients want to do that? Well, splat, for instance, deals with time series differently from spectra. Being able to set these differences as early as possible simplifies its code. Or so I claim, never having touched Splat's code myself.

The whole dataproduct_type thing collides a bit with our plans for a standardId INFO (which lets clients tell apart, for instance, SSAP responses from SIAP responses), but I think only just a bit: standardId lets clients tell apart different sorts of result lists, not data product types.

You know: The more I think about it the more I like it. I'd say once we have the note out, we should propose it for DALI (where the PARAM might turn into an INFO, but that wouldn't hurt).

Having looked there I remember that they put their QUERY_STATUS, and even their standardID into the the RESOURCE with "results". Which would be an argument to go for resource after all, but I'd vote against it. And as said above, having it in RESOURCE seems not easy to reconcile with what Obscore thinks these things are. And worse: Think of a poor client. if it has, for each resource or table, to redecide what sort of thing it is dealing with, it's getting hard for the author, and building a nice user interface around that becomes even harder.

So: We should discourage mixing, say, time series and spectra and tell people who think of that to use datalink. And have the dataproduct type be a global property of the VOTable.

Bonnarel commented 4 years ago

In some cases it might be better to add those PARAM at the RESOURCE level and in others directly in the TABLE.

We could put an example to show when it is better to choose one over the other.

These are examples built on April 23rd. This what I wrote to a small group of authors that day, with 6 examples: Dear all,

Dave I add you to a discussion which started a couple of days ago in Strasbourg because you were involved in gitHub

The question was where do we need to put "dataproduct_type" PARAMS.

It depends what the spec id is for ?

a )  TimeSeries embedded in some other document ?

 b ) Documents containing mainly TimeSeries (+ additional info) ?

 in case b ) do we allow one only or several TimeSeries in the spec ?

I think the answer is b ) / several

If we have several TABLES

c ) Is a TimeSeries a single TABLE ?

d ) Can we consider a single TimeSeries includes several TABLES

The problem is that the PARAM product_type and product_subtype are for the whole TimeSeries, they cannot be for a part of a TimeSeries. This is not what ObscOre says.

Well these are two architectures according to the answer to the  question : c or d.

         If the answer is d ) I call this the 3 level architecture

             The SPEC defines a VOTable document contains one  or several TimeSeries contained in the RESOURCE type="results". Other RESOURCES with additional data/metadata (type="meta") may be added if necessary..

      The main RESOURCE  (type="results") MUST start with an  INFO  tag with a reference ti the spec (See DALI recommendation applied in DataLink, SIA2, etc..):

          The main RESOURCE contains one or several RESOURCE "TimeSeries". In the case where the  TimeSeries is unique the RESOURCE "results" and the TimeSeries RESOURCE are merged  (to avoid unuseful RESOURCE level)

         A TimeSeries RESOURCE contains:

                      -  2 PARAMS  for product type and subtype. The product type PARAM is always "timeseries" , the subtype is simple (eg : lightcurve - single table) or complex (eg : lightveloctycurve several tables)

                      - Metadata TIMESYS, COOSYS, PHOTCAL and VELFRAME (a GROUP with PARAM giving the  REF position or whatever ? )

                       - one or several TABLES containing what we have in the spec

           If the answer was c ) I call this the 2 level architecture

               The SPEC defines a VOTable document contains one  or several TimeSeries contained in the RESOURCE type="results". Other RESOURCES with additional data/metadata (type="meta") may be added if necessary..

               The main RESOURCE  (type="results") MUST start with an  INFO  tag with a reference ti the spec (See DALI recommendation applied in DataLink, SIA2, etc..):

               The main RESOURCE contains Metadata TIMESYS, COOSYS, PHOTCAL and VELFRAME (a GROUP with PARAM giving the  REF position or whatever ? )

               Then there maybe one or several Tables which ARE the TimeSeries.

               Each TAble contains 2 PARAMS  for product type and subtype. The product type PARAM is always "timeseries" , the subtype is always simple (lightcurve - multiband light curve, velocitycurve)

I attach  you 6 examples to illustrate that:

                   TimeSeries3Levels.xml : in the 3 levels architecture has one single multiband TimeSeries. The 2 typing PARAM are outside the TABLE

                   TimeSeries2Levels.xml in the 2 levels architecture has one single multiband TimeSeries. The 2 typing PARAM are inside the TABLE

                   TimeSeries3Levels2TimeSeries : in the 3 levels architecture has 2 TimeSeries : the multiband one and a velocitycurve; This is managed as two RESOURCES incuded in the "main "results" RESOURCE  and including the typing PARAMS outside the TABLes. In addition there is "type="meta"" RESOURCE for additional information

                   TimeSeries2Levels2TimeSeries : in the 2 levels architecture has the same  2 TimeSeries : the multiband one and a velocitycurve; This is managed as two TABLES incuded in the "main "results" RESOURCE  and including the typing PARAMS inside them. In addition there is "type="meta"" RESOURCE for additional information

                   TimeSeries3Levels2TimeSeries-multitable.xml in the 3 levels architecture has 2 TimeSeries, the first one has the same multiband Table. But the second one is complex and has two TABLES : velocity curve and simple lightcurve. The subtype given in appropriate PARAM is complex. it doesn't have the additional "meta =" RESOURCE (but could have)

                    TimeSeries2Levels3TimeSeries : in the 2 levels architecture has actually thre Tables considered as three TimeSeries.  The simple subtypes are different in the three TABLES  it doesn't have the additional "meta =" RESOURCE (but could have)           

     Personnaly i prefer the three level one because it offers more flexibility (group together several tables belonging to the same source ) and includes metadata in the TimeSeries.

That's all folks

Cheers

François

Examples : TimeSeriesLevels.zip

AdaNebot / TimeSeries

Simple utypes #8