DHARPA-Project / kiara-website

Creative Commons Zero v1.0 Universal
0 stars 2 forks source link

What documentation is required to support end-users who use kiara via jupyter? #21

Open caro401 opened 6 months ago

caro401 commented 6 months ago

We will definitely have people who use kiara via its python API in jupyter notebooks, regardless of whether we think this is technically a good idea. We need to document:

(docs that I think we were already planning on making)

@MariellaCC does this cover everything you think an end-user would need to know to use kiara via jupyter successfully?

MariellaCC commented 6 months ago

This covers quite everything that seems needed, indeed, I think.

There may be other things coming up while experimenting. For example, when using Kiara in Jupyter for testing in the context of plugin/module development, I find it hard to use the error messages displayed via Jupyter. Print statements from module codes don't seem to get printed, so there may be questions about that.

If that is ok with you, I would suggest leaving this issue open for the moment, as you/I/other team members may have additional questions to suggest here while experimenting with the API.

makkus commented 6 months ago

For the record: in my opinion I think researcher-type end-users using kiara via Python (within Jupyter or not) have quite different needs than a frontend developer who tries to create a graphical UI for kiara, and the access patterns (and expected proficiency in Python and Software development in general) would differ in a meaningful way which I think should be addressed in any documentation we write. At the very least I'd have separate sections for both target audiences.

Personally, if I was responsible for this, I'd probably create a limited API with only a few select (and well documented, incl. context on a very exhaustive level) endpoints, with targeted testing of all those endpoints in as many scenarios we can imagine. This would limit the 'documentation surface' in that section of the docs, and it would make it easier to change/evolve endpoints that frontend devs need (which are quite a bit more, and as of yet not very well specified since we didn't really have any specific requirements/specs that came from frontend devs).

Speaking of that: testing and installing/creating Python is another thing that I think has quite different requirements and assumptions in the two cases. If we develop a GUI, we can manage and assume an exact environment with exact versions of all the dependencies we expect. If we have to support users who 'bring-their-own' environment, this becomes a lot harder (as we already experienced in some cases), and initially I did not put that on my priority list very high (since I assumed we'd have full control over end-user envs). So, this is another thing we'll need to spend meaningful time on designing and testing.

Another thing I'm not sure about: what is the strategy when a Jupter users wants to do something, but there is no module for that. Jupyter users will get there much quicker than users of a potential UI (IMHO), because the latter would be a lot more 'guided', and limited by its nature of not having total flexibility (like writing code has). Do we ask them to write their own modules? Or create a ticket for us to create it? Modules need to live in the context of a plugin, so if they create that themselves, they need to also publish it on Pypi (and ideally conda), which is a whole other level of commitment.

For module development (personally) I'd discourage use of Jupyter, and recommend using the cli, with maybe a few 'pure' Python convenience scripts if necessary (for the modules I developed, I found using the cli always easier and more efficient).

"Best practices" would also differ between resercher end users, and frontend devs. The latter have a lot more leeway in terms of using Python 'outside' of modules. For the former, it'll be very tempting to use kiara values 'raw', and in the process of doing that they'll break the lineage of the data they are working with. So that will require explicit documentation in a way so everyone understands why that is important, and how that limits what they can do in a notebook.

Anyway, those are just my opinions, and I might be wrong and things will work out much smoother than I anticipate and this will actually turn out to be useful to users. My concern is that we won't see many of the problems I am afraid of in the beginning when people don't do anything out of the ordinary and basically just follow a tutorial or so, but they'll manifest mid-/long- term, and by then it'll be harder or impossible to rectify.

MariellaCC commented 5 months ago

For the record: in my opinion I think researcher-type end-users using kiara via Python (within Jupyter or not) have quite different needs than a frontend developer who tries to create a graphical UI for kiara, and the access patterns (and expected proficiency in Python and Software development in general) would differ in a meaningful way which I think should be addressed in any documentation we write. At the very least I'd have separate sections for both target audiences.

I agree with this

Personally, if I was responsible for this, I'd probably create a limited API with only a few select (and well documented, incl. context on a very exhaustive level) endpoints, with targeted testing of all those endpoints in as many scenarios we can imagine. This would limit the 'documentation surface' in that section of the docs, and it would make it easier to change/evolve endpoints that frontend devs need (which are quite a bit more, and as of yet not very well specified since we didn't really have any specific requirements/specs that came from frontend devs).

Could we not have the same API but separate the documentation and, for the end-users using Kiara via Python, provide documentation that covers only the items relevant to them?

Speaking of that: testing and installing/creating Python is another thing that I think has quite different requirements and assumptions in the two cases.

doesn't the plugin template version freezing capacity mitigate these challenges?

Do we ask them to write their own modules? Or create a ticket for us to create it? The documentation would provide ways for those who want to create "community plugins", but for those who are Jupyter users that do not want to create modules, they could have a space to create feature requests?

Modules need to live in the context of a plugin, so if they create that themselves, they need to also publish it on Pypi (and ideally conda), which is a whole other level of commitment.

Can't they use the pip install with github link, like we did in the past if I remember well? Wouldn't that be enough for the community plugins?

For module development (personally) I'd discourage use of Jupyter, and recommend using the cli, with maybe a few 'pure' Python convenience scripts if necessary (for the modules I developed, I found using the cli always easier and more efficient).

I find the CLI ok when dealing with one module, but when there is more it is not easy and several users said that experimenting on the CLI was discouraging to adopt Kiara. I don't think we should ignore these comments. That said, for modules development, the problem is that the errors do not always display in Jupyter. Personally I create the code in a .py file first and once it works I reproduce it in Jupyter. Is such an approach problematic in your opinion?

makkus commented 5 months ago

Could we not have the same API but separate the documentation and, for the end-users using Kiara via Python, provide documentation that covers only the items relevant to them?

I guess. My point was more that I think if I would design an API for this type of users, I'd have designed it differently to account for the different access patterns etc. Also, the API as it is now is sort of "this is what I think makes sense and is roughly what a frontend dev might need", I sort of expected to be able to iterate over it once 'real' requirements come in (feeding from real use-cases). I tried to make it as realistic as possible by doing my own frontend prototypes & the streamlit stuff, to figure out what kind of access patterns would appear. But I always assumed that there would be changes, and I'd work closely with a frontend dev (once we finally had one), and iterate over this API. Which in turn means that this API would change according to those requirement, but that would be ok since the frontend dev is involved in the process, and could react to breakage.

Introducing a new target audience that uses the same API screws with this plan a bit, since I now can't really change it anymore in ways that break. Having a separate API (even if it's just a subset of the same methods) would have alleviated this problem, since it's easier to keep that stable.

doesn't the plugin template version freezing capacity mitigate these challenges?

Not sure what you mean. Jupyter users wouldn't really use the template? What do you mean by 'freezing capacity'?

Can't they use the pip install with github link, like we did in the past if I remember well? Wouldn't that be enough for the community plugins?

Haven't reallly thought it through, but it kinda sounds messy and someone would make sure it would work in all circumstances. I know for example that it can be dicy to pip install from a git url if git is not installed (or can't be found by pip because it's not in the right PATH -- which in the case of Windows is not unreasonable to expect). Might work, but would definitely need someone to take responsibility and test it thoroughly for all target platforms and scenarios. Personally, I would only use github pip installs for development, but anything 'production' should install from officially released sources. Still, even without the publishing issue, creating a plugin is definitely one step up from just writing some code in a jupyter notebook, and there are a lot of things 'normal' research users need to do and be aware of, comparatively. Things they would not have had experience with. If we assume that every (or most) of our Jupyter users will want to write their own modules, they also need to somewhat understand what Python classes are, how to create one from the KiaraModule base class, how to debug this if it doesn't work. Just saying this is quite the different effort to write a few lines of Python code that calls kiara.run_job a few times...

I find the CLI ok when dealing with one module, but when there is more it is not easy and several users said that experimenting on the CLI was discouraging to adopt Kiara. I don't think we should ignore these comments. That said, for modules development, the problem is that the errors do not always display in Jupyter. Personally I create the code in a .py file first and once it works I reproduce it in Jupyter. Is such an approach problematic in your opinion?

I never intended the cli to be targeted to end-users. I created it so that the team would have something they could start working with, to experiment with modules and esp. pipelines. But more so as a development environment for Python devs (which in my experience would be comfortable with and prefer a command-line interface) could contribute/work on kiara stuff in the future (modules, pipelines, etc). And lastly, to make my own development experience more efficient, I think only for that it was worth it.

I can't anybody force to use the cli of course, but in my opinion its the recommended (by me) and best way we currently have to work on this stuff, esp. creating modules and pipelines. Like all of those things it has a bit of a learning curve. I don't really know any Python developers who do their app development voluntarily in Jupyter, but if you found a dev process that works for you, sure, go for it. I'd go crazy if I had to use it :)

If you think it's a good idea for others to also do the same, then it might sense to document that instead of using the 'getting started' docs I wrote. Again, this is a target audience I didn't anticipate and develop for, so if what I have created is not good enough for them, that is a work item someone has to pick up, gather requirements, and implement a solution that works there.