Have you compared this to a greedy and/or combinatorial approach?

Tomalwo commented 3 years ago

Just curious, have you compared this to a greedy and/or combinatorial approach? In other words, why use RL for this problem?

diego-apellaniz commented 3 years ago

That’s a very interesting question. I should’ve explained in the Readme file, that the idea of this problem is that the position of the bamboo nodes is critical to determine the aptitude of a particular bamboo pole for a structural member, because bolt connections have to be located close to these nodes. And the node spacings are extremely irregular. So you basically can’t cut a piece of bamboo pole wherever you want.

Then, I decided to used RL for several reasons:

I understand this problem as an NP-hard problem, so brute force would become inefficient when the set of bamboo culms and structural members grow https://cs.stackexchange.com/questions/1859/rule-of-thumb-to-know-if-a-problem-could-be-np-complete
A greedy algorithm might not find the best long-term solution in some cases. It may happen, that in the first iteration a bamboo culm might suit perfectly for the first member, but then that the next member does not any suitable culm and there would be another possible culm for the member from the first iteration.
To my understanding, you can not easily specify mathematical constraints for this problem in order to use a linear programming solver. This kind of black-box optimization problem is therefore in my opinion quite suitable for a Reinforcement Learning approach.

But the question is quite interesting. I will upload a comparison against an approach using a Greedy algorithm. Also I think a recurrent neural network could solve this problem better than this approach with Deep Q-Learning…

Thank you very much for your comment!

zuardinakbar commented 3 years ago

Thanks for the explanation! I'm also curious about the generalizability of your RL agent. Did you also manage to use the same trained model for different design/material stock?

Tomalwo commented 3 years ago

Hi @diego-apellaniz,

Many thanks for your answer!

I think that the problem is very similar to Knapsack, which is NP-hard. Knapsack has a polynomial time approximation scheme, though.

It's true that Greedy wouldn't always find the best solution, but my hunch is that it should work reasonably well. (That's the case in this paper, for example.) Anyway, I'm looking forward to your comparison!

I'd also be interested to try and solve this problem with black-box optimization algorithms (e.g., evolutionary ones).

Best, Thomas

PS: Full disclosure, @zuardinakbar is my PhD student and also interested in this problem (using a given material stock efficiently)

diego-apellaniz commented 3 years ago

Hello @Tomalwo,

Thank you very much for the article and for your feedback! I’m glad that Reinforcement Learning is taking of in the AEC industry. I’m using mainly this paper from Google as reference for the RL algorithm, in which they focus on the Travelling Salesman Problem, but they say that they could also apply their algorithm to the Knapsack problem.

@zuardinakbar that’s a good point. The current neural network works just for a specific design+stock combination, but I’m working to generalize it for different stocks. I’m also looking forward to seeing the results of your research. And I will try to adapt my bamboo environment, so you can also test it with your own algorithms if you want. Of course, different approaches from RL would be possible as well.

Best regards, Diego

Tomalwo commented 3 years ago

Hi Diego,

Thanks for the link, quite interesting!

If I understand the paper correctly, the comparison with the optimization methods isn't entirely fair, because the RL approach requires hundreds of thousands of training steps. But the achieved performance is nevertheless impressive! (In other words, the provided runtimes in Table 3 are not very helpful when it’s unclear how many steps the optimization algorithms took).

Best, Thomas

diego-apellaniz commented 3 years ago

Yes I agree that they should’ve specified the training time or trining steps as well. Nevertheless I find remarkable the low execution time that it takes to run the trained neural network.

In combinatorial problems in which the problem space is not going to change, evolutionary solvers or similar solvers might be much more efficient in terms of training time than RL. But if you want to apply your solver to different variations of the several problem space, it can become interesting to use RL since you only would need to train the network only once for all cases.

This could potentially open up new design strategies since you can instanctly get the optimal solution for a certain problem variation...

Best, Diego

diego-apellaniz / Bamboo-ML

Have you compared this to a greedy and/or combinatorial approach? #1