JPeer264 / node-rcs-core

Rename css selectors across all files
MIT License
37 stars 7 forks source link

Short class should not be replaced #85

Closed X-Ryl669 closed 5 years ago

X-Ryl669 commented 5 years ago

Let say you have <i class="br"></i>

Unless you have a toy code, this might be mapped to a 2 letter class (like ag) or worse, to a 3 letter class (like tra). This is problematic because:

  1. You are not saving anything (you are increasing the code size instead of reducing it)
  2. The mapping is not easy to follow (br now is about .whatever)
  3. It would be very useful to be able to have some "keys" that are never modified so that they can be dealed with in the code untouched (like this: if (corner == "bottom") class='b'; else class='t'; if (side == "right") class += 'r' else class += 'l';). I know about the exclude rules, but if you have to list all possibilities in your code, it'll be a mess and hard to maintain.

Typically, let's say the code does this, first it extract all CSS class/id. If a class or id is already found in the "already" mapped class/id then relocate the former mapping to a new one so this class or id can be used directly (either it's removed from the mapping table or it maps to itself). That way, you know that small class or id in your code are left untouched and can be generated dynamically (they are kind of reserved for you).

Unfortunately, I don't know enough of this code to give a fix for this issue.

JPeer264 commented 5 years ago
  1. Yes you are totally right about this. But to know if there is some savings, there needs to be a pre procedure to know exactly the limit of the classes. With that I mean. If there are just 23 selectors, the longest class would be just with a length of 1, so I need to transform .br. Or without the logic I can implement a threshold of the minimum length of the selectors, which would be -1 / null or any other falsy value per default.
  2. I am sorry, I don't know what you mean with not easy to follow. Do you mean to debug those selectors after the name change?
  3. I totally agree with you, but this could be a bit time consuming to implement. My thoughts about this is to parse the JavaScript first, collect such connections and automatically put them into the exclude regex. But actually your solution should work too, but still I don't know yet a good approach to find those kinda connections
X-Ryl669 commented 5 years ago

If, when parsing CSS and extracting rules/selectors then creating a mapping, you find that the mapping is larger than the initial selector (this is only possible if all the previous mapping are used), then you: 1.1 If it's the same as itself, simply ignore. 1.1 Relocate the previous mapping that's exactly like the found selector 1.2 Map the found selector to itself.

For example, let's say you have used in your map all 23 selectors (from '_' to 'z').

  1. Then you find a class like '.a' in CSS.
  2. If the mapping contains a 'a':'a', you've already processed the case, you can stop, else
  3. You'll look in the map to find out 'previousClass': 'a' (that is 'previousClass' is mapped to 'a', in reality will be faster with maintaining a reverse map)
  4. Modify the 'previousClass' to 'a' (so that 'a' is mapped to 'a') and restart with 'previousClass' in 1 (instead of 'a')
  5. Since you don't find an entry in the map for 'previousClass' anymore, allocate a new (optimal) mapping, for example 'previousClass':'ab'

In the end, it's not possible to have longer mapping that the input. This is called Cuckoo hashing and it's very efficient and simple to implement and optimal.

JPeer264 commented 5 years ago

Ya, you are totally right. If I am not wrong, I have a reverse map implemented compressedValues in BaseLibrary. But ya the Cuckoo hashing sounds pretty useful in this case, if I understood it correct. As this seems more like an edge case for me, I will implement this later, as my time is very short - but still up for PRs :)

X-Ryl669 commented 5 years ago

Right, sorry for the delay, I was out of internet access for last 2 weeks. I'm currently working on this, and it's not as easy as it seemed (at least, I've found how to do it) to be optimal. Typically, let's say we have only 3 possible symbols (a, b, c) and you'll need to map hello, world, john, a, b, c, in that order. With a simple cuckoo hashing, you'll end up with: hello:aa, world:ab, john:ac, a:a, b:b, c:c. This is not optimal, since an optimal minimizer would compute the number of occurrences of each class and map the minimal symbol's length to the shortest symbol, then the second most used class is mapped to the second shortest symbol's length and so on.

You already have statistics in the code, and I wonder it's possible to do that in 2 passes (one where you map using simple cuckoo, then another one remapping with optimal hash table). Yet, this is also the role of gzip (that is find the most probable sequence and assign the minimum coding to it), so the space saving gain, compared to the effort to implement this, will be minimal I think (that's gut's feeling, I haven't measured this)

So in the end, I decided to implement the simple cuckoo method first and if I'm bored, will give a shot to the optimal method later on.

JPeer264 commented 5 years ago

Right, sorry for the delay, I was out of internet access for last 2 weeks. all good, welcome back to the internet ;)

I think also a simple chuckoo method will bring some benefits - so I'm really looking forward to your implementation 👍

JPeer264 commented 5 years ago

To ensure to have no breaking changes for this version I would suggest a new option param like hashingMethod: 'chuckoo' | null and set the default to null, so the current behavior. In the next major release we could set the default then to 'chuckoo'