c0c0n3 / kitt4sme.live

On a mission to bring AI to the shop floor: https://kitt4sme.eu/
MIT License
1 stars 28 forks source link

Platform Configurator takes a while to respond #296

Closed vcutrona closed 8 months ago

vcutrona commented 1 year ago

Describe the bug

When issuing a request to the Platform Configurator, it takes too long to provide a response.

To Reproduce

Run the following command:

$ curl -kl "https://kitt4sme.collab-cloud.eu/platform-configurator/kits?sid=a3e1a2f9-e796-4cae-a847-c055c29bea20" > pc-res.json

Expected behavior

The Platform Configurator should provide suggestions in seconds.

Additional context

The server is notably slow today, so it might be running low on resources. However, the complexity of the Platform Configurator may represent an issue if there are many datasheets available (it tries to compute the global optimal solution by searching the full solution space). We should clean the datasheets database, given that there are many "test" entries. We're going to optimize the Platform Configurator algorithm anyway.

RyanKelvinFord commented 1 year ago

Any updates here?

RyanKelvinFord commented 1 year ago

Possibly not an issue anymore?

vcutrona commented 1 year ago

The problem is still relevant, since performance will be greatly affected by increasing the number of datasheets. For the demo, we implemented a straightforward but naive algorithm that computes the optimal kit to recommend with exponential complexity (it computes the powerset of available tools -> 2n). The problem at hand is very similar to other NP-hard problems (depending on its formalization, it may be reduced to the Knapsack Problem, or the Subset sum problem), so it is very hard to reduce the complexity without adding constraints to the problem definition.

To date, without constraints (e.g., max number of tools to be included in a kit, max price, etc.), we can address the problem with heuristics/optimization techniques (e.g., greedy algorithms, backtracking with pruning, approximation algorithms, genetic programming, etc), or we can compute the kits offline, to speed up the online search (but then we have to keep the kit repository updates every time a new datasheet is submitted to the platform). We tried to implement the optimizer using a backtracking with pruning algorithm; with a sample of 50 datasheets only, it takes ~200 seconds to compute ~2.68 million kits -- which is fine, considering that the search space has ~1125899 billion possible kits (250), but which is not suitable for a rest service that needs to provide a response in few seconds.

We're still working on it to find a good trade-off between performance and returned kits. We may also think of relying on ML/AI solutions.

vcutrona commented 8 months ago

This issue has been mitigated by implementing a new matching strategy (we mapped our issue to the cover set problem, and then we implemented a greedy solution).