DHARPA-Project / kiara-website

Creative Commons Zero v1.0 Universal
0 stars 2 forks source link

How do I load a single file into kiara using the python API #9

Closed caro401 closed 6 months ago

caro401 commented 7 months ago

Yes I know some of this is in the operation docs, but I really need concrete examples of how to load a file from github and from the user's filesystem.

makkus commented 7 months ago

The only 'stable' file import operation at the moment is import.local.file. It's one of the few modules included in the kiara package (not even kiara_plugin.core_types), so in theory you don't need any plugin installed at all (although I haven't really tested not having core_types installed in a while, so there might be breakage, but feel free to report an issue if that's the case).

The single argument is path (kiara operation explain import.local.file), and it takes either an absolute path, or a relative one from the current directory. It does not support urls or anything non-local.

what optional things can I give to that operation? Is this a thing where aliases come to play?

No, nothing.

is there anything else I need to know/think about

No. Except I'm thinking about a more generic module that can take all kinds of strings and is smart enough to figure out how to retrieve the file from wherever it is (in the onboarding plugin). The main difficulty with that is to figure out a good module interface that will work for as many cases as possible, and it might still turn out to be a bad idea in the first place. Anyway, that module is not in any way ready to be used as it is expected to change quite a bit. Happy to consider ideas anyone might have in that regard.

If you want to load files from github, there is only the very alpha one in the onboarding plugin at the moment, but I wouldn't recommend to use that atm; it is an area I intend to focus on in the near/medium-term future, and create a set of modules that complement each other well and are able to get retrieve and import datasets from any of the potential sources we've identified.

For the type of functionality you need but that isn't ready yet, it might be a good idea to create your own plugin project and add very basic modules that do what you need. The advantage there is that with those we don't need to think too much about their interface (inputs/outputs schema) yet, and you can replace them fairly easily once there is an 'official' one. And they can be used as input to designing the 'official' one in the first place. The other advantage is that you have full control over the module, so you don't have to worry that the module interfaces changes under your feet while in development.

makkus commented 7 months ago

Is this a thing where aliases come to play?

You can assign an alias to the imported file. In the cli you'd do it something like:

kiara run import.local.file path=pyproject.toml --save file=my_alias

In Python, you'd do something like:

from kiara.interfaces.python_api import KiaraAPI
from kiara.models.values.value import Value

api = KiaraAPI.instance()
inputs = {
    "path": "/home/markus/projects/kiara/kiara/pyproject.toml"
}
results = api.run_job("import.local.file", inputs=inputs)

file_result: Value = results["file"]
api.store_value(file_result, "alias_from_python")
caro401 commented 7 months ago

Can I have an example using the python API (kiara.api.KiaraAPI.instance()) rather than the CLI please?

makkus commented 7 months ago

Yeah, I'm about to write it up, one sec.

makkus commented 7 months ago

Up, finished my comment (above). Happy to change the docstring for that function if you have any suggestions. I guess one area to write up would be the whole concept of storage, but that would probably be too much for this particular comment and needs to go into its own sections in the future docs.

makkus commented 7 months ago

Btw, you don't need to store a value if you don't want to persist it and don't need a (human-readable) alias in the UI (often it's not necessary for temp data). The value will be available in the runtime until you restart the Python process.

caro401 commented 7 months ago

So the relevant things I learned here was to not use the operations in the onboarding module, and that you have to have files locally. So I've got a bunch of rewriting to do in my app prototype. I'll write this up in a how-to doc and send a PR shortly.

In the future, please could you avoid giving CLI examples, I find it really confusing and hard to follow, because it's similar but not quite the same as usage via the Python API. I think the consensus from Mariella's research was that no end user wants to use the CLI, and I don't want to spend extra time documenting it.

makkus commented 7 months ago

Ah, and you might rather use queue_job instead of run_job if you don't want the operation to block:

import time
from kiara.interfaces.python_api import KiaraAPI

api: KiaraAPI = KiaraAPI.instance()
inputs = {
    "path": "/home/markus/projects/kiara/kiara/pyproject.toml"
}
job_id = api.queue_job("import.local.file", inputs=inputs)

job = api.get_job(job_id)

# some way to pull, ideally we'd have an event system of some sort, but there is none yet
while job.finished is None:  # will add an 'is_finished()' method to this object in next version, as this only returns a date if the job is finished which might be unintuitive
    time.sleep(1)
    job = api.get_job(job_id)

results = api.get_job_result(job_id)

file_result = results["file"]
api.store_value(file_result, "alias_from_python_2")
caro401 commented 6 months ago

closed via #11