Set Up MVP Package For Label-Image Creation

AFg6K7h4fhy2 commented 1 month ago

This PR constitutes a first-pass at adding:

An abstract Label class.
A Label for collected specimens called CollectionsLabel.
A Label for systematics called SystematicsLabel.
Utility functions for working with and generating multiple labels.
A notebook with label creation examples, including re-creating labels found on the Fossil Forum and other places.

AFg6K7h4fhy2 commented 2 weeks ago

The following comment details an issue being experienced.

Presently, Label is an abstract base class for CollectionsLabel. Label has many attributes and CollectionsLabel of course has some other, more specific attributes. When a user creates a CollectionsLabel, there should not be much needed if they want to make use of the default values for the CollectionsLabel specific parameters. Also, they should be able to change the order of the parameters to suit the order in which they want the group titles (e.g. "Collection" to appear).

As an example,

label1 = paleo_utils.CollectionsLabel(
    save_directory="../assets/saved_images/",
    id_number="2131",
    collection="AMNH",
    collector="Dr. Montague",
    location="Europe",
    coordinates=(40.7128, -74.0060),
    date_found="2023-01-01",
)

Should output the following:

ID: 2131
Collection: AMNH
Collector: Dr.Montague
Location: Europe
Coordinates: (40.7128, -74.0060)
Date Found: 2023-01-01

Changing the order in the paleo_utils.CollectionsLabel() call should change the output order. The problem I am having is in trying to determine whether the parameter name e.g. location in location= should be turned in the group title. There are certain parameters that aren't groups, such as coordinates_separate: bool = False which guides where the coordinates occur.

An idea is to have a class for a default label be separate from the class CollectionsLabel. The class CollectionsLabel then receives a blank label instance and can use that as the place to add the label.

Further thoughts will come soon.

AFg6K7h4fhy2 commented 2 weeks ago

The solution to the aforementioned issue is to probably work with calls like the following:

label2 = paleo_utils.CollectionsLabel(
    collection="AMNH",
    id_number="12345",
    collector="Dr. Larson",
    custom_titles={
        "collection": "Repository: ",
        "id_number": "Catalog: ",
        "collector": "Finder: "
    }
)

AFg6K7h4fhy2 commented 2 weeks ago

I asked a question on Code Review.

AFg6K7h4fhy2 commented 2 weeks ago

Re: https://github.com/AFg6K7h4fhy2/Paleo-Utils/pull/5#issuecomment-2453302370

So that the answer to the CR question doesn't become lost, I've reproduced it below:

Code Review Answer

use pathlib =========== ``` label = paleo_utils.CollectionsLabel( save_directory="../assets/saved_images/", date_found="2024-01-01", ``` Rather than `str`, consider storing that folder name as a Path. Then it will be significantly more convenient for clients to make use of. Kudos on using ISO8601, so lexical order matches chronological order. optional values =============== You consistently admit of `None` for each attribute. This resembles a relational database table lacking even a single NOT NULL clause in its [DDL](https://en.wikipedia.org/wiki/Data_definition_language). I encourage you to go back and reconsider whether a CollectionsLabel truly could have e.g. no `collection` and no `id_number`. data type ========= Several date fields curiously use `str` rather than a `datetime`. I imagine `chrono_age: str` gives you the flexibility to store human-friendly descriptions like "1.2 Bya" or "66 Mya", which would lead to unfortunate lexical ordering. And I'm concerned that someone might try to jam "late Devonian" in there. Consider using one or two numeric fields to store age, and worry about human-friendly formatting when outputting a retrieved record. Giving `size` a type of `str` seems mind boggling to me, but I imagine you have your reasons. Perhaps it is a {small, medium, large} enum? Or some sizes are in square meters while others are in cubic meters? idiom ===== Certainly this works, ``` ... {key: kwargs[key] for key in kwargs} ``` but it is more usual to iterate over `kwargs.items()`. And in this particularly simple case you should use `kwargs.copy()` to produce the desired shallow copy. But it's not clear why you'd want to make a copy at all, given that each function call makes a shallow copy already. The OP contains no unit test which would turn Red if we stop copying. If you are trying to conform to some [documented](https://www.attrs.org/en/stable) attrs advice, then please add a `#` comment which cites the docs. If there is another concern at work here, please comment on it, and write a unit test which highlights the effect of interest. I claim that any valid calling code one might write wouldn't care about whether a copy had happened, that is, one could not demonstrate a Green test which turns Red when we remove that copying operation. If inheritance plays into this, show that in your unit test. As written the code is obscure -- I cannot think of a good reason for making a copy of the args. (In an inheritance situation is common enough for a V2 child class to destructively `.pop()` off V2 args that it knows about, so they won't bother the V1 parent class. But that doesn't appear to be relevant here.) zero items ========== You have declared that `title_overrides` cannot be None, which is a good choice. ``` def __attrs_post_init__(self): ... if self.title_overrides: ``` No need for the guard, there. Just iterate over zero items in an empty dict, no harm done. Please phrase the `for` as `for key, value in self.title_overrides.items():`, and use similar single-line formatting down in `label()`. Wouldn't a simple [.update()](https://docs.python.org/3/library/stdtypes.html#dict.update) suffice here? meaningful identifier ===================== ``` label_attrs = { attr.name ... ``` That looks more like you're creating a new `names` variable, to me. Also, rather than merge that code fragment into the `main` branch, just delete it, along with the half dozen related lines of commented code that follow it. isinstance() ============ ``` if ( value is not None and not isinstance(value, dict) ): ``` Please simplify to ``` if not isinstance(value, (dict, NoneType)): ``` (Your IDE will help you add `from types import NoneType`.) embedded newlines ================= ``` return "\n".join(parts) ``` I note in passing that you apparently prohibit certain characters in all those fields, such as NEWLINE, but that wasn't enforced anywhere. If one crept in, it would corrupt this returned value, turning a single record into apparently multiple records. * * * * * Overall this seems a reasonable design. > expect to be able to use argument ordering ... You didn't mention your target environment, but I'm willing to assume it does not include e.g. Jython, and will be restricted to just modern cPython. The python *language* distinguishes between unordered `dict` and the `OrderedDict` class which makes stronger guarantees. The cPython *interpreter* now imposes ordering on `dict`. The old promise was "an arbitrary order which is non-random", but that changed several years ago. Starting with interpreter [3.7](https://docs.python.org/3.7/library/stdtypes.html#dict-views), `dict` iteration promises to preserve insertion order. Also, PEP 468 ensures that `kwargs` order shall match the arg order seen in the source code. So, yes, your expectation is reasonable.

AFg6K7h4fhy2 commented 2 weeks ago

Since it would not be nice to provide all the parameters covered in Label each time, having some templates available would be useful.

AFg6K7h4fhy2 commented 2 weeks ago

The workflow of Label creation needs to be better established. Does one make a Label, save the class, then reuse it each time they want to make a collections label or systematics label? This would imply that CollectionsLabel and SystematicsLabel might not be best of as their own classes.

AFg6K7h4fhy2 / Paleo-Utils

Set Up MVP Package For Label-Image Creation #5