BioImage-Archive / bia-integrator

Apache License 2.0
2 stars 3 forks source link

Make ingest use persistence strategy (API/disk) #214

Closed kbab closed 1 month ago

kbab commented 1 month ago

Address clickup ticket https://app.clickup.com/t/8695e2y7p . The ingest command can now either persist using the API or read/write artefacts to disk. The readme has been updated.

sherwoodf commented 1 month ago

Should the default be on disk? I'm a bit worried about the default making changing to production data...

I ran it prior to reading the readme, so didn't notice the change, but thankfully it failed & errored.


                                                                                                                                                                                        │
│ /Users/fsherwood/workspace/bia-integrator/bia-ingest/bia_ingest/persistence_strategy.py:111 in fetch_by_uuid                                                                           │
│                                                                                                                                                                                        │
│   108 │   │   object_list = []                                                                                                                                                         │
│   109 │   │   for uuid in uuids:                                                                                                                                                       │
│   110 │   │   │   api_get_method = f"get_{to_snake(model_class.__name__)}"                                                                                                             │
│ ❱ 111 │   │   │   api_obj = getattr(self.api_client, api_get_method)(str(uuid))                                                                                                        │
│   112 │   │   │   obj = model_class.model_validate_json(api_obj.model_dump_json())                                                                                                     │
│   113 │   │   │   object_list.append(obj)                                                                                                                                              │
│   114 │   │   return object_list                                                                                                                                                       │
│                                                                                                                                                                                        │
│ ╭─────────────────────────────────────── locals ────────────────────────────────────────╮                                                                                              │
│ │ api_get_method = 'get_study'                                                          │                                                                                              │
│ │    object_list = []                                                                   │                                                                                              │
│ │           self = <bia_ingest.persistence_strategy.ApiPersister object at 0x1053b6740> │                                                                                              │
│ │           uuid = '72f86dbf-ebf1-41c2-b807-27e3e4e0f449'                               │                                                                                              │
│ │          uuids = ['72f86dbf-ebf1-41c2-b807-27e3e4e0f449']                             │                                                                                              │
│ ╰───────────────────────────────────────────────────────────────────────────────────────╯                                                                                              │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'NoneType' object has no attribute 'get_study'
sherwoodf commented 1 month ago

Minor comment, but i think it would be better if the 'written' log lines were debug level - these can always be viewed with a -v, and would reduce the quantity of output when dealing with long file lists. We might want to revisit some of the debug logs as well. image

kbab commented 1 month ago

Should the default be on disk? I'm a bit worried about the default making changing to production data...

I am happy to change this. My impression was that the most common use would be for the API, but I guess users may want to explore things locally before running against the API. @liviuba do you have a preference for the default?

liviuba commented 1 month ago

Should the default be on disk? I'm a bit worried about the default making changing to production data...

I am happy to change this. My impression was that the most common use would be for the API, but I guess users may want to explore things locally before running against the API. @liviuba do you have a preference for the default?

No strong preference aside from really wanting 'Undo' functionality in the api :) @sherwoodf what's the scenario you had in mind where data would be accidentally written to prod?

sherwoodf commented 1 month ago

This morning i ran it assuming it would write locally (because that how it currently works) and would have submitted data somewhere (i don't know where, i wasn't paying attention to this) were it not for the fact that i got some error (see above).

I get that in most cases we do want to send data to the api, but without a magic undo button it feels there should be an intentional flag to say 'please actually do this' when we may also want to run the (otherwise) exact same command for other reasons.

liviuba commented 1 month ago

This morning i ran it assuming it would write locally (because that how it currently works) and would have submitted data somewhere (i don't know where, i wasn't paying attention to this) were it not for the fact that i got some error (see above).

I get that in most cases we do want to send data to the api, but without a magic undo button it feels there should be an intentional flag to say 'please actually do this' when we may also want to run the (otherwise) exact same command for other reasons.

To me, this looks like a bug for a missing validation where the tools fails in an unexpected way if it's missing config items for what it's executed for

LE: Happy for the default to be anything, but it would fail in the same way if it had a --save-to-api flag but no config set

kbab commented 1 month ago

Should the default be on disk? I'm a bit worried about the default making changing to production data...

I am happy to change this. My impression was that the most common use would be for the API, but I guess users may want to explore things locally before running against the API. @liviuba do you have a preference for the default?

No strong preference aside from really wanting 'Undo' functionality in the api :) @sherwoodf what's the scenario you had in mind where data would be accidentally written to prod?

I have changed default to disk.

kbab commented 1 month ago

Minor comment, but i think it would be better if the 'written' log lines were debug level - these can always be viewed with a -v, and would reduce the quantity of output when dealing with long file lists. We might want to revisit some of the debug logs as well.

Changed the INFO to DEBUG.

kbab commented 1 month ago

This morning i ran it assuming it would write locally (because that how it currently works) and would have submitted data somewhere (i don't know where, i wasn't paying attention to this) were it not for the fact that i got some error (see above). I get that in most cases we do want to send data to the api, but without a magic undo button it feels there should be an intentional flag to say 'please actually do this' when we may also want to run the (otherwise) exact same command for other reasons.

To me, this looks like a bug for a missing validation where the tools fails in an unexpected way if it's missing config items for what it's executed for

LE: Happy for the default to be anything, but it would fail in the same way if it had a --save-to-api flag but no config set

Added assertions in init methods to check valid parameters are supplied.