Open Taniya-Das opened 6 months ago
A complication here is that we need to filter out any uploaded resource that was not inserted by our connector.
People might manually insert data from a platform in a db. For instance, if you want to upload a dataset to HuggingFace, you can do so using the AIoD HF uploader. A similar thing is not yet possible for OpenML, but you can insert Metadata in the catalogue with platform
=Openml and platform_resource_identifier
=999999999
We probably want to attach a bit of info on each metadata, who was responsible for this metadata. It should be on the AIoDEntry of each resource. Should we set .aiod_entry.editor = [identifier-of-openml-connector]
? Note that editor
currently points to Person
. Or should there be a new field for this?
In synchronisation.py (line 146), we use dictionary state = {} to save from_id, offset and last_id used to fetch data by the connectors. This is more error-prone. A better way to do this would be to make this process stateless. This can be done by checking the last_id of the respective connector's data in the database.