30350n / inventree_part_import

CLI to import parts from suppliers like DigiKey, LCSC, Mouser, etc. to InvenTree
MIT License
24 stars 8 forks source link

Issue with Handling Categories Containing Path-like Structures in Names #21

Closed sle118 closed 3 months ago

sle118 commented 3 months ago

Problem Description

I've encountered an issue with the current implementation of category handling in the parts import tool, specifically when dealing with categories that contain path-like structures in their names. An example of such a category is Crystals, Oscillators, Resonators/Pin Configurable/Selectable Oscillators, where Configurable/Selectable Oscillators is intended to be a distinct category rather than a path division.

Note that my instance of inventree contains the Digikey schema as published here: https://github.com/silverXnoise/digikey-part-category-schema

The current approach uses string splitting on the "/" character to construct category keys. This method incorrectly interprets categories with path-like names as multiple nested categories, leading to inaccuracies in the representation of our inventory's structure.

Suggested Solution

To resolve this, I suggest revising the code to utilize the getParentCategory method for each PartCategory. This approach will accurately reflect the hierarchy by building category keys based on actual parent-child relationships rather than relying on string manipulation, which can be prone to errors with our current naming conventions.

A proposed solution has been drafted as follows:

def build_category_key(category):
    """Recursively build the category tuple using getParentCategory."""
    parents = []
    while category is not None:
        parents.insert(0, category.name)
        category = category.getParentCategory()
    return tuple(parents)

part_categories = {}

for part_category in PartCategory.list(inventree_api):
    part_categories[key] = build_category_key(part_category)

This adjustment should ensure that categories are accurately represented and organized within our system, regardless of how their names are formatted.

Thank you for considering this issue and proposed solution!

30350n commented 3 months ago

Hey, thanks for checking out the tool! I've actually thought about this before and decided against this supposed solution, because the tiny problem with it is that that results in an API call per category. Which could result in like a minute of initialization time which is not bearable.

(One API call per category with a clean implementation actually, with the naive approach it's even more).

The real problem here is that the InvenTree API returns the path as a string in the first place imo, so that's probably what should be resolved.

sle118 commented 3 months ago

I was aiming for the quick fix, I have to admit, and I didn't realize that the categories were refreshed every single time the tool was launched... and with thousands of categories it is obvious that the solution doesn't work well.

30350n commented 3 months ago

Yep, no worries. I'll see if I can find some time to make a PR to get things going on the InvenTree side the next few days.

sle118 commented 3 months ago

I didn't get a chance to explore the API, but calling GET /api/part/category/ should return a list that already contains all the categories and from there, assembling the path locally wouldn't require several round trip for each sub category.

sle118 commented 3 months ago

In the meantime, and because it was extremely inefficient, I've rewritten my code to make better use of the fact that we have a full list of category objects available. Short of having a "getParentKey" method that returns the primary key of the parent, I am accessing the _data member directly here. It's dirty, but it will do the trick until a final solution makes it to the code base.

So I created this method.

def build_category_hierarchy(categories):
    """
    Build a dictionary mapping each category's pk to a tuple of category names from root to the category itself.

    Args:
        categories (list of dicts): A list of PartCategory objects represented as dicts.

    Returns:
        dict: A dictionary with category pks as keys and tuples of category names as values.
    """
    category_map = {category['pk']: category for category in categories}

    def get_category_names(pk):
        if pk is None:
            return ()
        category = category_map.get(pk)
        if not category or 'parent' not in category or category['parent'] is None:
            return (category['name'],)
        parent_names = get_category_names(category['parent'])
        return parent_names + (category['name'],)

    result = {}
    for category in categories:
        category_names = get_category_names(category['pk'])
        result[category['pk']] = category_names

    return result

And I'm using it as follow

    part_categories = {}
    inventree_categories = PartCategory.list(inventree_api)
    inventree_hierarchy = build_category_hierarchy(inventree_categories)
    for part_category in inventree_categories:
        key = inventree_hierarchy[part_category['pk']]
        part_categories[key] = part_category

I'm thinking that each inventree API could implement an option to cache results in the first place (disabled unless requested since existing consumers likely assume fresh data on each call, and also possibly having some expiry built-in), but this system is very new to me and the suggestion possibly doesn't even align with the current architecture design.

Anyhow, thank you for this tool; it did pave the way for me to quickly build an inventory of parts for which I had lost control already!

30350n commented 3 months ago

I didn't get a chance to explore the API, but calling GET /api/part/category/ should return a list that already contains all the categories and from there, assembling the path locally wouldn't require several round trip for each sub category.

Oh yeah, actually you're right. That's what the PartCategory.list(...) call already returns and all the data required is already there (aka there's no need to call part_category.getParentCategory()).

30350n commented 3 months ago

Yeah, handling this is pretty easy actually. I ended up with something similar to your first proposed solution, thanks!

sle118 commented 3 months ago

Sweet! Thank you!