alercebroker / web-services

API to query internal and external data.
https://api.alerce.online/ztf/v1/
Apache License 2.0
1 stars 0 forks source link

[Feature] Selector for for latest version when many provided #279

Open AlxEnashi opened 5 months ago

AlxEnashi commented 5 months ago

Is your feature request related to a problem? Please describe. The database may have many values for a veature differenciated by version of the feature extractor (this may happen with classifications and magstats and corrections?) The behaviour right now is not centralized and not completly clear.

Describe the solution you'd like A consisten way to select the version tu use to display data musy be choosen.

Describe alternatives you've considered Save a list with the different versions ordered by time or relevancy. If a query return many object return the one with the greates index in the list of versions. If not in the list, its always -1. Note. the special case for lightcurve classifier and its features must be considered (exclude features version 23..

Additional context This happen for example selecting the period for a lightcurve. The object ZTF20aawwxkg have many periods calculated image The lightcurve uses 0.9980563820532297. Why? image

ale-munozarancibia commented 5 months ago

This problem also affects classifiers.

When a new classifier and/or feature computation go into production, and an object has a previous classification and/or feature, then the Explorer shows more than one value for it, but it should show the latest one instead.

Example: ZTF22abyhaut

If I search ZTF22abyhaut in the Explorer, "Object ID" search filter, classifier "Stamp Classifier", output shows 2 rows with different highest probability classes (SN and bogus) and highest probabilities (0.537 and 0.415). A query to the database shows that these correspond to classifier versions "1.0.1" and "stamp_classifier_1.0.4" respectively. Although 2 results were shown in the search output, pressing on any of them leads to the same result https://alerce.online/object/ZTF22abyhaut, where bogus is the highest probability class (even when I pressed on the "SN" result row).

For this object, the same happens for the classifier "Lc Classifier" search: it gives 2 rows as a result, corresponding to versions "hierarchical_rf_1.1.0" and "lc_classifier_1.1.13".

This object has a period computed as a feature and displayed when selecting "Folded" in the light curve panel. It shows one value, but a query to the database shows that there are 2 "Multiband_period" computations, with values 0.330039 and 1.0, corresponding to versions "lc_classifier_1.2.1-P-transitional" and 23.12.25 respectively. It is not clear how the Explorer selects which one it will show.

The Explorer should show results only for the latest version. This includes the stamp classifier, the light curve classifier (and its branches), and features (currently only multiband period available in the Explorer).

AlxEnashi commented 3 months ago

Many solutions were considered.

In my opinion, to keet the most features of the search and api system, the refactor should be at database query level. Considering this I propose to use the taxonomy table to filter the last version of each classifier when searching and a similar table (even the same works) to fiter last version of features. In the case of features is not that critical because there are no search querys over the table. So a backend solution could be implemented, but for consistency y suppor using the same solution for probabilities and for features.

ale-munozarancibia commented 3 months ago

New proposal, based on discussions with @AlxEnashi: