bazingagin / npc_gzip

Code for Paper: “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors
MIT License
1.77k stars 156 forks source link

Syntax error in compressors.py: unterminated triple quote #19

Closed jabowery closed 1 year ago

jabowery commented 1 year ago
    def get_bits_per_char(self, original_fn: str) -> float:
        """
        Returns the compressed size of the original function
        in bits.

"""Test Compressors"""
EliahKagan commented 1 year ago

This bug appears to have been introduced in https://github.com/bazingagin/npc_gzip/commit/404253f2256e898eb3e65815b5e9c4c611190331. The problem is not limited to closing """ quotes being missing. The implementation of get_bits_per_char has also gone away, as shown by this fragment of the diff:

-        """
-        with open(original_fn) as fo:
-            data = fo.read()
-            compressed_str = self.compressor.compress(data.encode("utf-8"))
-            return len(compressed_str) * 8 / len(data)

I've proposed a fix in #21 (for this issue and another one of apparently similar origin, #20).