DHARPA-Project / kiara-website

Creative Commons Zero v1.0 Universal
0 stars 2 forks source link

Define some coding standards to use in the docs #10

Closed caro401 closed 7 months ago

caro401 commented 7 months ago

Can we define some coding standards to use in the docs for how we import the API client, what we call it, what we name the return value from jobs etc? I mostly have no opinions about what these should actually be, just that they should be consistent.

There's probably a bunch of different conventions used in different bits of documentation, notebooks, sample code. Settling on something to help make examples consistent, correct and accessible to people with not very deep Python knowledge would be really helpful, before we start writing a bunch of new examples in all different styles.

I'm not too keen on throwing around the term API in this setting. I think it's confusing to end users coming from humanities, so if we could avoid using that term in the variable names I think I'd prefer that? - I'd appreciate input from @MariellaCC or Caitlin on the use of this term and how it feels to them, how well they understand what it actually means?

things we need to clarify

the snippet we put at the start of everything - @makkus which is the right API to be using?!?

# markus' docs example uses this
from kiara.interfaces.python_api import KiaraAPI

# some jupyter examples use this
from kiara.api import KiaraAPI

can/should we communicate something about applicable package versions at this point, for example a comment after the import saying what version was used for the example?

when we've imported that thing, there's a line that looks something like this. What do we call the api client thing?

# potentially bad because the package is called kiara?
kiara = KiaraAPI.instance()

# potentially bad because jargon
api = KiaraAPI.instance()

# would client be a useful word here? 
kiara_client = KiaraAPI.instance()

Do we define arguments in a separate variable, or straight in the function call?

inputs = {"some":"thing"}
do_operation('name', inputs=inputs

# or 
do_operation('name', inputs={"some":"thing"})

Do we tell users to use run_job or queue_job? I'm strongly in favour of run_job because it reduces a bunch of complexity, as long as we note somewhere that queue_job exists. See for example how much extra code in this queue_job example. In examples particularly, people never don't want to just wait for the thing to finish.

What do we call the return value from a run_job? We usually bind this to a variable, it would be useful to pick a consistent name/name scheme. result? results (is there more than one result?)? <operation_name>_result? in this pattern:

# ...
results = api.run_job("import.local.file", inputs=inputs)

Do we use type hints in code examples? probably not, unless they really make things clearer? Again, what do @MariellaCC and Caitlin think about this - are type hints familiar/useful to you?

makkus commented 7 months ago

from kiara.api import KiaraAPI

That one is better to use in docs (both point to the same Python class).

api = KiaraAPI.instance()

That's my personal preference, it's jargon, but it's also correct (plus the Python class is called that). And if someone doesn't know what an api is but using Python, then they probably also don't have a very firm concept of 'client'.

Do we tell users to use run_job or queue_job? I'm strongly in favour of run_job because it reduces a bunch of complexity, as long as we note somewhere that queue_job exists. See for example how much extra code https://github.com/DHARPA-Project/kiara-website/issues/9#issuecomment-1814232313. In examples particularly, people never don't want to just wait for the thing to finish.

I think we always used run_job. I only mentioned queue_job because it's relevant for an interactive client. Unless that is the case, I would always use run.

result? results (is there more than one result?)? _result?

Depends on the operation, most have one, some have multiple output fields. The Python object is a map-like thing though, so my feeling is that 'results' is more correct in more cases (since you always have to slice away the single field you are interested in, even if only one result field is contained).

caro401 commented 7 months ago

Best guess at best practice example code, I'll edit this as we discuss/decide more things. using the import file process as an example usecase updated to reflect mariella's comments

from kiara.api import KiaraAPI  # version 0.5.0 and newer
# call the thing kiara
kiara = KiaraAPI.instance()

# keep inputs separate, give them a meaningful name "<something_meaningful>_inputs"
import_csv_file_inputs = {
    "path": "./myfile.csv",
    "quite": "often",
    "there's": "lots",
    "of_things": "here",
}

# always use run_job
# call the return value "<something_meaningful>_results" because there's usually lots of values returned
import_file_results = kiara.run_job('import.local.file', inputs=import_csv_file_inputs)

# no type hint here
imported_file = import_file_results['file']
caro401 commented 7 months ago

Jupyter users @MariellaCC, @CBurge95 are you happy with that? does it match with how you tend to write your notebooks, is anything unclear?

CBurge95 commented 7 months ago

Just a weigh in on terms here (from the humanities/history front) but I think it depends where you are planning on using this language: I'm generalising but I think those using CLI will likely already be comfortable with APIs, what this means and more importantly, how it works, but those in a UI or basic python environment (aka Jupyter) are much less likely to follow this beyond a surface level (i.e. an API helps you get information from one system into another system). Quick wizz round those currently in the office suggests that using this term is going to be less than helpful, and I (personally at least!!) think that naming it api versus kiara is technical accuracy at the expense of wider understanding. This is obviously not to say that we don't explain and/or document the fact that kiara (in this instance) is the name for the api.

And if someone doesn't know what an api is but using Python, then they probably also don't have a very firm concept of 'client'.

At least for my part, this is correct. I think some assumptions might be being made about the average user/average coding ability of someone in digital humanities, and how this is then impacting language decisions in both docs and kiara itself. Obviously this is not new, but I think the less complex/more informative we can make the whole process, the more welcoming and useable it is likely to be.

caro401 commented 7 months ago

@makkus is there a technical reason why kiara is a bad name for binding KiaraAPI.instance() to? I'm not familiar enough with the details of python's module system to know whether I need to care about a naming collision here?

If there's no technical reason not to, I'd prefer calling the thing kiara if that makes things clearer for end-users. I care more about being inclusive at this stage, and people with strong technical knowledge and understanding will cope, and can go to the type hints or code if they need to.

makkus commented 7 months ago

I don't really have an opinon here, I just used api myself because it's short. People using pandas probably also use 'df' before they understand what a DataFrame is, so personally I wouldn't worry too much, but kiara is fine as far as I can see.

MariellaCC commented 7 months ago

@caro401 I also prefer calling it kiara. Alternatively, kiara_client or kiara_api seem intuitive enough to me if there is a problem with the fact that kiara is the package name. I find api too vague here.

MariellaCC commented 7 months ago

@caro401 Concerning the inputs and results variable names, I tend to use different names for each operation to enable potential re-usability of these variables (especially for the results one, I don't think that I have encountered use cases of using the inputs for two distinct operations so far). It already happened to me in the past that the value of a results variable was used as an input for two distinct operations and that these operations were not immediately following one another.

MariellaCC commented 7 months ago

@caro401 Concerning type hints, I do not use them often enough, but I know that I should. I find that they add clarity, but I am not sure that users would use them, I don't know.

Concerning operations: I find this syntax easier to read: inputs = {"some": "thing"} than adding the inputs inside the operation. But then I'm not sure if it should be called inputs or have a less general variable name. I am also not sure if it should be the operation name, as there could potentially be a re-use of the same module/operation several times in the same workflow.

caro401 commented 7 months ago

Thanks @MariellaCC, that's really helpful! I've updated the code snippet above to match what I think you've said, let me know if I've misunderstood anything.

In terms of type hints, after a bit more thinking, I think I'd prefer to leave them out of code, because I think the way code will fail if you've not imported a type will be really un-intuitive for less experienced Python users. This would be a particular problem if you are copy/pasting little bits of code around, which I expect might be how these tutorials will be used in practice. It's also possibly a bit of Python syntax that people haven't seen before, and I don't want to make writing Python harder than it needs to be.

Do you think that's reasonable? Are people more comfortable with type hints than I imagine?

CBurge95 commented 7 months ago

I think we can use the module/operation metadata more effectively to nod at type hints. I find type hints helpful as a reminder but only when I already know what they're there for, and I think you're right to say that if people are just copying &pasting they might either end up superfluous or just completely wrong. I think if they are outlined clearly/narrative-ly enough in the metadata/operation instructions this might be able to solve both problems at once? This relies on people reading through this bit properly as a first step, but hopefully that will be done anyway!

caro401 commented 7 months ago

Hmm yeah, you're probably right. @CBurge95 would you be able to edit https://github.com/DHARPA-Project/kiara-website/issues/10#issuecomment-1814317252 to put type hints where they'd be useful for you? And if you have time, suggest or make changes to my attempt at sample code https://github.com/DHARPA-Project/kiara-website/pull/11/files, to suggest how it might be helpful to introduce type hints?

or were you thinking more like a separate docs page that introduces our conventions for code in the docs, including explaining a bit about type hints?

CBurge95 commented 7 months ago

Ahh sorry; I was thinking in terms of this import_csv_file_inputs = { "path": "./myfile.csv", "quite": "often", "there's": "lots", "of_things": "here", }

that we can also have this nice and clear in the metadata - sorry I completely misunderstood, entirely my fault for not reading things properly!!

I think normal type hints (i.e. the hashtags bit) are actually really helpful - ideally if we're writing in markdown they'd be fleshed out in the paragraphs(?) surrounding them, but I write type hints quite a lot and appreciate when they're there, it helps me keep track of my own code, and I struggle to read others without it.

caro401 commented 7 months ago

Sorry, I completely failed to explain what I meant by type hints. In Python (3.6+ IIRC), you can optionally add annotations to variables and functions to tell the code user, your text editor/IDE and optionally a python type checker (you might see MyPy or Pyre for this), what Python type that thing is. For example

# no type hint on this variable, the user has to guess what type it is, it might change later on in the code
thing = 'stuff'

# this one has a type hint (the : str bit), which tells the user this thing is a string
# so when you refer to the variable later, you know what kind of data is in it
other_thing: str = 'example'

This is more useful when you have a variable with a more complex type than string of course! We might use it to denote for example that run_job returns something, then you extract a Value out of that response. This might look like

# ... some other stuff
results = api.run_job("import.local.file", inputs=inputs)

# this file_results thing is of type Value
file_result: Value = results["file"]

This helps the person reading the code have an idea what's stored in a variable, and therefore what you can do with it later on, and can also help your text editor (VSCode etc) autocomplete functions on that value later. Here's a quick summary about type hints in case you want to learn more

@CBurge95 I think you might be referring to comments? We should absolutely put comments everywhere!

# this is a comment in python, it starts with a #
MariellaCC commented 7 months ago

@caro401

About the code snippet: This is exactly what I meant, thank you! And I agree with the conventions you defined in this snippet.

Type hints: I think you're right in leaving them out of the code for Jupyter notebooks.

CBurge95 commented 7 months ago

I think you might be referring to comments? We should absolutely put comments everywhere!

Yes I definitely was - thank you bearing with me, and for explaining things so clearly to me!! Sorry everyone for clogging this issue with my incompetence. And given my utter (self-inflicted) confusion, I would agree that type hints are good left out of Jupyter notebooks (and coders of that level) - great for developer documentation, less so for newbie users like me.

makkus commented 7 months ago

Quick heads up: I'm working on some docs for the network_data type, and for that it looks quite useful to add the type-hints (along with their import statements) to some of my code examples. This is geared toward more low-level usage (like creating a module) as opposed to using the kiara API in Jupyter, so I think it'd be maybe a good thing to be more flexible in this regard for the lower level docs.