Open bannanc opened 6 years ago
I think the general answer to this is YES, but I will not implement it now.
For example, we at least know that extra layers are never necessary for the first pattern in a list since it could end up being the most generic. At minimum, I suggest switching to only adding layers to the necessary patterns. That is if you had 5 clusters you would start with all clusters getting a SMIRKS with layers=0
then if that doesn't separate the clusters you would use layers=0
for the first cluster and layers=1
for clusters 2-5. Then on the next round you would use layers=0
for the first cluster, layers=1
for the second, and layers=2
for clusters 3-5.
The ordered list layers is the minimal solution, however, its possible there are only a few SMIRKS which really need an extra layer. I still can't figure out a way to add these without either doing a brute force check all patterns. An alternative solution is to explore removing layers in the Reducer
you could have a remove_last_layer
option that just gets rid of all the outer most layer in that pattern.
Right now, well in PR #48,
SMIRKSifier
automatically choses the number of layers, but it just increases the number of layers with each try. However, This doesn't completely make sense since the first SMIRKS in the list shouldn't need extra layers, and so on. However, there might be places where the clusters that are hard to distinguish aren't right next to each other in the list. This brute force check slows things down significantly.Maybe there is something we could do to make a most similar based on the
ClusterGraphs
and then order and determine the number of layers based on that.