django-daiquiri / daiquiri

A framework for the publication of scientific databases
https://escience.aip.de/daiquiri
Apache License 2.0
26 stars 8 forks source link

FEATURE: Improve scalability of datalink #108

Open agy-why opened 1 year ago

agy-why commented 1 year ago

The implementation of datalink is a relatively transparent: simple SQL query to datalink table provide all information necessary for datalink service.

However since datalink supposely lists all links between objects, the number of links may become very large. The current implementation is very well suited for linking files related to objects like observations, plates, documentation, previews... It also covers the doi case. In all these case the number of linkage is relatively low, since the number of objects is low.

However, we may want (in the frame of blind discovery) provide further linkage between large number of objects: typically sources. In gaia or applause we have several billions of objects, adding several links for each of them, rapidly show how large the datalink table may become.

The typical use case is to find further information about a source. In gaia for instance gaia_source has flag columns that indicate rather the source is present in specific table or tables. The issue of the flag method is that it requires an apriori knowledge of the data structure (flags column names...). Publishing this information via datalink, would provide a way to blindly learn about further information.

Therefore in this case the table approach is not scalable and an alternative need to be found.

A solution could be to generate a part of the datalink entries "on the fly". This could be done via adapter. The current implementation of datalink in daiquiri does not allow an "on the fly" generation of datalink entries. Currently Datalink are django objects and are stored in the tap_schema.datalink table declared in the the Datalink Model.

It is unclear how this can be achieved, is it possible to generate temporary Datalink entries that can be used for the generation of the VOTable response to the datalink service, that will not be ingested into the tap_schema.datalink table.