aiondemand / AIOD-rest-api

A containerized application using FastAPI and SQLAlchemy connected to a MySQL database.
MIT License
10 stars 7 forks source link

Make connector synchronisation stateless #247

Open Taniya-Das opened 6 months ago

Taniya-Das commented 6 months ago

In synchronisation.py (line 146), we use dictionary state = {} to save from_id, offset and last_id used to fetch data by the connectors. This is more error-prone. A better way to do this would be to make this process stateless. This can be done by checking the last_id of the respective connector's data in the database.

josvandervelde commented 6 months ago

A complication here is that we need to filter out any uploaded resource that was not inserted by our connector. People might manually insert data from a platform in a db. For instance, if you want to upload a dataset to HuggingFace, you can do so using the AIoD HF uploader. A similar thing is not yet possible for OpenML, but you can insert Metadata in the catalogue with platform=Openml and platform_resource_identifier=999999999

We probably want to attach a bit of info on each metadata, who was responsible for this metadata. It should be on the AIoDEntry of each resource. Should we set .aiod_entry.editor = [identifier-of-openml-connector]? Note that editor currently points to Person. Or should there be a new field for this?