DHARPA-Project / kiara-website

Creative Commons Zero v1.0 Universal
0 stars 2 forks source link

How to use the module config options #27

Open MariellaCC opened 4 months ago

MariellaCC commented 4 months ago

Use case: I need to only onboard text files from a folder and I am using the import.local.file_bundle module.

When using the module without config options, some undesired DS_store file gets onboarded, but I am only interested in .txt files.

I did a kiara operation explain import.local.file_bundle -m in the terminal to see if config options are available for this module, and found "include_file_types" and "exclude_file_types" that seem available:

╭─ Operation: import.local.file_bundle ───────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                         │
│   Documentation     Import a folder (file_bundle) from the local filesystem.                                                            │
│                                                                                                                                         │
│   Inputs                                                                                                                                │
│                       field name   type     description                                                 Required   Default              │
│                      ───────────────────────────────────────────────────────────────────────────────────────────────────────────────    │
│                       path         string   The local path of the folder to import.                     yes        -- no default --     │
│                                                                                                                                         │
│                                                                                                                                         │
│   Outputs                                                                                                                               │
│                       field name    type          description                                                                           │
│                      ───────────────────────────────────────────────────────────────────────────────────────────────────────────────    │
│                       file_bundle   file_bundle   The imported file bundle.                                                             │
│                                                                                                                                         │
│                                                                                                                                         │
│   Module type       import.local.file_bundle                                                                                            │
│                                                                                                                                         │
│   Module config     {                                                                                                                   │
│                       "constants": {},                                                                                                  │
│                       "defaults": {},                                                                                                   │
│                       "include_file_types": null,                                                                                       │
│                       "exclude_file_types": null                                                                                        │
│                     }                                                                                                                   │
│                                                                                                                                         │
│   Module metadata   Import a folder (file_bundle) from the local filesystem.                                                            │
│                                                                                                                                         │
│                      Author(s)                                                                                                          │
│                                     Markus Binsteiner   markus@frkl.io                                                                  │
│                                                                                                                                         │
│                      Context                                                                                                            │
│                                     Labels       package: kiara                                                                         │
│                                     References   source_repo: https://github.com/DHARPA-Project/kiara                                   │
│                                                  documentation: https://dharpa.org/kiara_documentation/                                 │
│                                                                                                                                         │
│                      Python class                                                                                                       │
│                                     python_class_name    ImportLocalFileBundleModule                                                    │
│                                     python_module_name   kiara.modules.included_core_modules.filesystem                                 │
│                                     full_name            kiara.modules.included_core_modules.filesystem.ImportLocalFileBundleModu…      │
│                                                                                                                                         │
│                                                                                                                                         │
│                                                                                                                                         │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

To learn more about what kind of inputs are expected here, I did kiara module explain import.local.file_bundle, which resulted in the following info:


╭─ Module type: import.local.file_bundle ─────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                         │
│  Documentation                                                                                                                          │
│                         Import a folder (file_bundle) from the local filesystem.                                                        │
│                                                                                                                                         │
│  Author(s)                                                                                                                              │
│                         Markus Binsteiner   markus@frkl.io                                                                              │
│                                                                                                                                         │
│  Context                                                                                                                                │
│                         Labels       package: kiara                                                                                     │
│                         References   source_repo: https://github.com/DHARPA-Project/kiara                                               │
│                                      documentation: https://dharpa.org/kiara_documentation/                                             │
│                                                                                                                                         │
│  Module config schema                                                                                                                   │
│                         Field                Type                 Description                        Required   Default                 │
│                        ─────────────────────────────────────────────────────────────────────────────────────────────────                │
│                         constants            object               Value constants for this module.   no                                 │
│                                                                                                                                         │
│                         defaults             object               Value defaults for this module.    no                                 │
│                                                                                                                                         │
│                         exclude_file_types   -- check source --   File types to include.             no                                 │
│                                                                                                                                         │
│                         include_file_types   -- check source --   File types to include.             no                                 │
│                                                                                                                                         │
│  Python class                                                                                                                           │
│                         python_class_name    ImportLocalFileBundleModule                                                                │
│                         python_module_name   kiara.modules.included_core_modules.filesystem                                             │
│                         full_name            kiara.modules.included_core_modules.filesystem.ImportLocalFileBundleModule                 │
│                                                                                                                                         │
│                                                                                                                                         │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯                                                                                                                    

The info provided "check source" to learn about how I should format the input and what inputs are accepted refers to the source repo info indicated: source_repo: https://github.com/DHARPA-Project/kiara.

@makkus my question is the following, how do I find easily the import.local.file_bundle within https://github.com/DHARPA-Project/kiara to be able to check what input I can use?

makkus commented 4 months ago

So, in this case you'd find the file that contains the class 'kiara.modules.included_core_modules.filesystem.ImportLocalFileBundleModule' that is displayed in the Python class value (either via the github repo or using your IDEs discovery mechanism (depends on your IDE, I use Pycharm and it's fairly easy to get to that class there). Lets assume github:

https://github.com/DHARPA-Project/kiara/blob/6cca099ff5fe1846fc822c129e2ff27c3f66009b/src/kiara/modules/included_core_modules/filesystem.py#L121

Once you have the Module class, you check the classes _config_cls class attribute, which is 'ImportFileBundleConfig' in our case:

https://github.com/DHARPA-Project/kiara/blob/6cca099ff5fe1846fc822c129e2ff27c3f66009b/src/kiara/modules/included_core_modules/filesystem.py#L111

Here you can check the type-hints for the config options you are interested in. (Sidenote: just realized there is a typo there and one 'include' should read 'exclude' -- will fix that now)

Further reading would be pydantic ( https://docs.pydantic.dev/latest/ ), because all Module config classes inherit from the pydantic BaseModel class. But for your purpose that shouldn't be necessary.

It's not trivial to surface that info through to a user interface, and so far I did not want to spend the time trying to do that because I imagine this to be an advanced use-case, that either wouldn't happen too often, or it would happen to people who are comfortable navigating source code this way. I can try to implement it though, if there is demand, as long as everybody thinks its important enough.

MariellaCC commented 4 months ago

Thanks, @makkus, this is very helpful.

I would now like to check whether the string needs to include the point of the extension (e.g. ".txt" or "txt"). I guess that my next step would be to have a look at how a "KiaraFileBundle" works? How could I find where the "KiaraFileBundle" is located? Is it a Kiara data type or something else?

makkus commented 4 months ago

Right, so, this is something that I probably should have put into the description of those options, which I'll do now. Come to think of it, I can probably also include a note that those options are list of strings...

Anyway, still a good idea to know how to figure that one out, so...

You can find the 'KiaraFileBundle' class by looking at where it is imported:

https://github.com/DHARPA-Project/kiara/blob/6cca099ff5fe1846fc822c129e2ff27c3f66009b/src/kiara/modules/included_core_modules/filesystem.py#L16

Which would lead you to:

https://github.com/DHARPA-Project/kiara/blob/6cca099ff5fe1846fc822c129e2ff27c3f66009b/src/kiara/models/filesystem.py#L234

and the 'import_folder' method:

https://github.com/DHARPA-Project/kiara/blob/6cca099ff5fe1846fc822c129e2ff27c3f66009b/src/kiara/models/filesystem.py#L298

Which should show you after a bit of code-reading that the 'ends_with' test is used for both of those, so as long as your filename ends with one of the included/excluded strings, there's a match. Whether you include or exclude a dot doesn't matter.

Jumping around in the source code like this is essential to efficient programming, in my view, so I'd recommend figuring out how your IDE makes that easy for you. In Pycharm which I use, I can just do Shift-Shift and the name of the class, and it leads me there, or right-clicking on the class name and 'goto declaration' also does the same thing. I'd be surprised if there wasn't something in VSCode that does the same thing, so if you haven't already, try to find that option (maybe it's called 'goto definition' instead, something along those lines...).

MariellaCC commented 4 months ago

thanks, just a detail but I could find out quite quickly that it was a list of strings that is needed, but wanted to figure out if there is a way the strings need to be (with or without the point "txt" or ".txt" inside a given string).

makkus commented 4 months ago

Yeah, fair enough. Like I said, I should have documented that in the description of the options that the test is against the end of the file path, and as long as that matches you're good, with or without the point. It should not be necessary to jump through the source code to find that...

Still, good to know how to efficiently jump through dependency source code :)