giogenna16 / ComputationalIntelligence_2022_2023_s304684

0 stars 0 forks source link

lab1 peer review - Christian Dal Pozzolo #4

Open shadow036 opened 1 year ago

shadow036 commented 1 year ago

1. Greedy algorithm

The first algorithm works well and it seems pretty efficient. If I have really have to find some (very) small improvements I think I would try to get rid of the "num" variable and replace it with a boolean one, which would be going to act as a "switch". In this way the purpose of the variable would have been clearer plus you could have also avoided the need of additional useless increases when "num" is already greater than 0. This may have a small impact when N is low, but with N very high (500, 1000) the cost may start to be more noticeable. Another small improvement may be storing in the "selectedLists" variable only the indices of the chosen lists from the "sortedListOfLists" list as this can help by decreasing the amount of memory needed to store the "selectedLists" variable.In this case the sum of the weights could be computed by summing the length of each added list during each iteration in which a list is actually added (inside the if "num" == 0 of if "flag" == True) in a variable initialized to 0 outside the first "for" loop. If we want to push the efficiency to the extreme, you could have also avoided to store the selected lists at all since it seems that this value is not returned at the end of the function but this depends on the context: in some cases it may be useful to know the feasible sequence. In the end I would say that the "greedy algorithm" is almost perfect and the suggestions I gave shouldn't be too impactful on the performance anyway.

2. Custom DFS algorithm

For what concerns the second algorithm, I highly appreciate the precise initial comment explaining the reasoning behind the custom DFS. In addition to one consideration explained in the previous section (namely index usage), I would have moved the whole block starting with "if not discard" in place of the line which states "discard = False" and kept the break afterwards. In this way we can avoid using one additional variable. Another point which can possibly help quite a lot is that instead of keeping all solutions until the end, we use a sort of "self-adapting bound" approach. This can be implemented by adding 3 modifications: a. We don't actually store the whole solution but only the weight of that solution (optional). b. We don't store all solutions/weight, but only the best one up to that point. c. When we are looking for a new solution and a partial concatenation already has a weight larger than the current optimal one, we immediately discard it (every time a new node is visited we compute the weight of the current concatenation + weight of new node and we compare with the best one). In this way we keep only one solution at a time instead of amassing all of them until the end and we can save some time by immediately discarding concatenations when they cross the current optimal threshold. Additionally I would rather have used a bound on the tree depth instead of stopping after 50000 visited nodes because that approach leaves branches (from root to leaves) totally unexplored. Finally I think it's possible to integrate the approach used in the previous algorithm when it comes to checking if a solution contains all numbers instead of creating a specific function. Maybe in this case a recursive approach with an array of flags (each of them indicating the presence of number #i in the current concatenation) passed down the stack could have been useful.

3. A-star algorithm

The third approach is heavily inspired from the template and so it is semantically correct but I think this problem is not a good fit for an A-star algorithm because there is no clear way on how to choose a realistic heuristic part of the A-star function, in fact in this case the heuristic part is the same as the know part (len(state)). Additionally the priority function penalizes good solution which need only n or close to n numbers to reach the goal by giving them a priority cost close to 0 while solutions with a lot of repeated numbers tend to have a high negative cost. Anyway this last part seems to work for the current problem (maybe by using trial and error to find a good priority function) and seed but I'm not so sure that it can be effectively generalized for high Ns or for other problems with similar characteristics,


If you have some questions or there is something I haven't explained very well feel free to ask :)

giogenna16 commented 1 year ago

Thank you very much for the review!