compiler-explorer / compiler-explorer

Run compilers interactively from your web browser and interact with the assembly
https://godbolt.org/
BSD 2-Clause "Simplified" License
16.38k stars 1.74k forks source link

[REQUEST]: Full URLs are much longer than gzip or zstd | base64, can this be better? #5458

Open pcordes opened 1 year ago

pcordes commented 1 year ago

Is your feature request related to a problem? Please describe

According to https://github.com/compiler-explorer/compiler-explorer/issues/597 , we do use LZ-string to compress long URLs. But it seems to do a much worse job than DEFLATE (gzip) or ZSTD.

A 10.1K C file with two version of a function (with large chunks of similar text) produces an 8KiB base64 URL (8775 bytes). Or 8125 byte if I close one of the compiler panes.

But locally doing gzip | base64 makes only a 3.87K base64 string (3972 bytes) from a 2.87K .gz file. (Or somewhat larger from a zstd-compressed file, 4.1KiB base64 from 3.03K binary.) Even using gzip -1 | base64 instead of the default -6 only makes a 4.4K base64.

So why isn't compiler-explorer making shorter URLs that shrink redundant source? I know it has to include some configuration text, not just the source, but I assume that doesn't account for an extra 4KiB after compression.

Describe the solution you'd like

A good compression algorithm should find the redundancy between large chunks of similar code, so URL size is correlated more with the information content of the linked code, not its byte-count in ASCII. ZSTD is fast and good.

DEFLATE aka LZ77 (gzip) is very widely available and compresses slightly better in the one case I just tested, although it's slower. Not a huge deal for code that's in the tens of KiB size range.

Describe alternatives you've considered

Maybe there's some LZ-string tuning or compression-level option where the current setting is bad?

IDK why LZ-string isn't already compressing like gzip locally; I assumed from the name that's what it would do. Perhaps this should be treated as a bug report instead of a feature request? But if it needs a re-arrangement of the order of serialization / compression / base64, it might need a new URL format version number, which sounds like a feature to me.

I tried base64 -d to decode the part of the URL after the :, but it says invalid input after writing 583 bytes of binary data.

Additional context

The source I was editing was for https://stackoverflow.com/questions/77037432/suggestions-on-further-optimising-this-chi-square-function-using-sse2-intrinsics (where I had to trim my answer down to keep it less than 30K characters; it was initially over 34K.)

The actual link is

https://godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAMzwBtMA7AQwFtMQByARg9KtQYEAysib0QXACx8BBAKoBnTAAUAHpwAMvAFYTStJg1AB9U8lJL6yAngGVG6AMKpaAVxYM9DgDJ4GmADl3ACNMYgkuAGZSAAdUBUJbBmc3Dz04hJsBX38gllDwriiLTCsshiECJmICFPdPLhKypMrqghzAkLCI6IUqmrq0xr62jryCnoBKC1RXYmR2DgBSACZIv2Q3LABqJcjHPvxBADoEPewljQBBS6v0WeD6beBMAmQEPAUAR1dqzHRjExXEQAG6YZAQPwEbYJABemFI21cUIAHMZoQAqdBMKqTW5LADsACFbtsydsAPQU7ZMBRKGoQOGYXaRbDbDR4yIk66k8lU7YAFQA8gARIUgbZ4KgwvDwlkXSIi7YANgArKrIsrEa4lEioZEVujtmhXIIFJKGH1MEx0LyycjBFxlUbUMF6WDbaqiSslqqRb7varlb6lXtQ8TCUSNIT/QT/Vy7ZTqY4xMhXAYCMyqMRMD9GMgAJ7bVDS63vbbBAuZ7YgsSuTCJqloS3QqGSzPEHFJc1hmVy1bK7YAWkkeyJfOpTMl5qYMRif0EtCLYgA7kwCzPtnEV2Fi9KfZEAGKJ5t9C2tjtdgQ9xV9hsJ67k7b8YjbSGCSUspXRrmfvaOdswivS0x0mXYI0fJ9yVdd1/gDaM/QDbEqgDPBVhJFYSUQxDMPQscn35U1iBcWhizBV8CBXVAaWITsN22IgK1eDttg%2BHY%2BhITAhxfNdiCOYBtgzfMC0TJ8YLCD0Ay4EMkJxJhULw3DsP9XDcN/CdthXBBGAY7SczADhNxzYAkk0wgEBpQTUGhEtdOZBRWGZZD5K9NDEMgqDizdCS4K9BD/S9ZyFNUrCApUjCSQfK5PLJcTiEkr1pOwokgtcxTQpkkKx0TGNG2pd5wQAazfKcBxK2UuK4SYwLDXsNDJKUt1QHdX1slY72nBgDOhYBfk7QRMH%2BUg8u2AB1D5yyo9N0GePrDEzeyKtWVUViDL86ufEg73OMNVuDHk1ka6MPKTbYGGss7BqwGaXyalqhxLId2q2lh0xsGJ6AeqgnrWpkFETW731bL9AM7cp/t/PAWQApl8LQkKaog6KYrihKoxkwK5IU5T0uyk7coOyIHClfETv5FNaDTIS7OND4h2%2BPr7KqGw%2BjwZAYt2E7GsZCr5V7PbdhWYMhY6/9ewNAllRRRHxw0p0h2CQhz0wF5XwUdwYRicE8DEY1aQbE6CMnJbWV2taIC286GCHeFiJpBgZth1a9tqxUOWePAwU3LSXGZKEVd3TBVC16x/hrOsDeRzz%2BVi/xtn6l4hwK5Bis%2BGFaDwYAEAIJdthYTjtkwKgaGQPBGGhAgEEMbbnaDV2RXqgcytK/ZxZWSWUVEjSyTEBRqMrtP6AIc1K%2BZNAWBiOhd0K86V3bAzzQSYB/HQIcMkSMFETEWhmr8ASKQFxjtB1aFQkshQPioAhji7sl%2BXVlg9xhH4/nNGhWwYSyHQIA0jRzdXaDQkwGCT%2BfgaZUWIGeUQupbLb2NLMM0Fpixx2CK4FOrwQC31Om%2BCA1sGAHyDDVVUbIVg1UwtsVaqoMQQA0EOfBe08TEPau/c0YDXAGkLiApEMQGLUQYGGZUkgADSgtxwrFWlgmOhgZosCYKobY%2BBi4gynEOO8BDBwsKQRiJeK8MTnidEaFujg24d0kcbfst4JZS3QgLKatAZomTBJZWR8jFHSlshoVQKINAEioIiPu2wrEomnDTIgMRSZRygm2IOIdMwzV7FOakLt1JQX5EwTSCBfZnS6BRKu0JRCfzPtXf2qssFtneHgIQL8cwgx/LLI2ZlK6i1bpY9uUtESj00iQKB%2BsuldQ4DZciVAd5z3STosOBoFZKyhP46iO49afyHvAieU9IEKOovOdBM1JmKw/hXaiwzUA4ngQwciCQBAWitDaJ%2BaAYgFj3rwmEogDCvhKbuaRFYmAp02mrZ51RzyCMeYc45EAVzmVmNCK4AA1AAGkOVUXBmFbVNOMmazYzlJDxIbCc9xXCPGZDE8EcSQYQFxfiyYiSKFBmOBoKg%2BFybXiqB%2BQlocZq1jcJHDmgM2xQzqnDaGVLgy/hllghp8tdkMQLFrGctARk03HpPegFFqI6mZDEdWwRRkOy3LI9A6AVwzS0jpB%2BLAHmMX1IaAg%2BJDrSmktiu%2B1IADijgAL3EwOaH4hBmQ7llYiTYhgBJpx3HgPiN97XYIAJK6T8MVf1RgF7K2IMwWg5oq5OL7mwAeRhNLEWzbdGIur9XoH8Rsb15ltjQphe1RgTB8UzUIAmnUDz0kAE0ACybaYQayYFfd5OqbRFu2DCjtiIM6FT9hXD4DBCrmmUG2q4IoRSjSVGCrAlpHnpNXbuME1gSB7GPOGp8UIDHQjcSS49zoCCTAtt5eKvl0bJTcqGchqN71JQCkSJ9YFVEsriXjSJHNeVqSVNJKKHMnwVKqYzURvZSVSmlHooGkw3GcllvyAtA6DUhPSTspWD8n6UWouKpWMRiLoDQcPa1RMU2codYKYg9zs2MRzKR915dkHMn/m9IcpRMBsA/NmVAj8MN6qw7I8dI8p3AHNK9Sa3SsxbWdQBD5cbgBhqjvyAU2kiyyKLGfC%2Brhi4ZyY9RF40J0k7qIK1aUbj6q8f48Pe2M0LNEq2rZNxXBC70AcwodT4HyQXqNLZ89jpL3XtfZ6B9H74ZRdEV5WCkX33BSi9%2BwuwciVwTA/5/Rl6FHwc87BwLV6b0JfghjT96F31xYi1JcrMX32pd/Zl7kAGYpAcikqA8LXsu00qdU5keFbxweLhoahUJkPwfduhPLxcuBjcEBN2b%2BFToicHWnHDKwpnQnw7ZQj2xiPQlIxgCj/1CbEzpeGgmrWyQ5gIHMT%2BkH%2BuUjfGS%2Bgkwmu2hSVBHKcYvO6kjFg17Y8PhQb%2BAGUcfoQaRg0DS3KWXPLAqAel1lyjeZJOpbS/9XKtpAz/LeOpeOALJPHDFsMkgRWHo0pkrWVB0y53SQq1ZZlZXxxItG80wQvnFVcDwqE1EBDMks256Ukhnw7xxKd67qTqSoMnZgAsBkannUncczYwCwjHDJKNctnSIDfxPWBaB0Icz5w9IiJT6z3XFmKmCppuH8kIOHn57LMdHAGGzW6hQ/TT3UVpJpUopFbcWXt4iWX7ZzSsaoGEfMCnXxV1oN9VdTShdrM92dC6HZiB4EeEWW6B3NdIgYJswqDyPhZwpHKvwVyZq2WPmeOVdjnOyuannAuEDipa1eS2YgFHTJB5ptaYgGddx9EM1QZ3/nEczaoI3Ib429dFfC7etG/lkv%2BWq8vt9dX0L%2BUa8jv96l%2BQHa7SwREttUA8dUJmB2YdzWCF/tCW66KwjnM/oxRHoryRT48yF4r%2BuwslY%2BSRar5pSYRVbTY1aJTb5gEhh76xLNaf5kjf7wbtSFYLYL6hbohL6lZ%2BTQHeghgb44FEhJagH4F%2BhwEZafbdbZbIHFyRC/7Xr/5YGAF3rAF4GRAEEQGb6JbsGwHDhpbwFUGIG9ag45hlaQ6DZKi2ZYiTZY7ZaPaMy1YSHAbT5zYeZyH%2BYKFg5eg%2BjKEdbT4rAyHFxdbCFaFiFegcF6G9huKRBGFUAcHw7gb8glaX5gSIYwSuEgwwS%2BjYDtSqKGEwQYhBxxZBw%2BE%2BiXZxhYJmENi4FWG3jREBi6HxjUGeQJFQFxFKhpFEiWHJF0b4b3Arhv7UTtRA43ggbbCmAsCIooi2jhpZHr6SEiH9ZKG5GnT5HNRFH7brJ4r0BYK3b3ZNGKGxFKjUikoPBvYfbLaeQ/b%2Bg8iRE8iEwbBbADb7B4AsCmqCBZ4MCnA7SLEMCbCuA7D/iHBiCZzbFnCmwLF3DjHMhmYVIMx/AAiyIMC/C0DGB0iYArC45MjaiohGhYhyRYo3BIwNK0j0gEA8wWJsgcj/ojTChigSiNQtwKhKhqgahahIi6jfwP7wKmiOZV6Zg2jagMD0B0j%2B7bAzwt6jwK4cA1I%2BqkSMCzBZx5Z9BZ6y5wRkzUi9yoDEnESyoPKoDkQUI0R0QybWjrqdIuK4kfhgKGBFgC4EbaSPxpzx7SidI/EVjAh2SfwmoPL%2BD/AjxzJ4BrrPghp9AT4VFXAZzLy0gQBOhgRME2TcGJHlYuwSG3gA7EjRiRGRFRQjQUxUw4hZg5h5j7HymlhfIWSVjVjsr1h5Snh7KgzARlGiwiwjhTH3y8zrZzgLg5zLi0Brj0TpLbi7htT7oniMrnhJngyo7wgaE/Jvg8q1L8r/jVndigTgQpFQSQFRayQoSkF4Q4xZRfaEQMB8mkSCm7h7bVCimPKhAEAsRsRMycTcQkC8T8SCRBlhlYI9kkEpRYwDlKRhS4wjnUhGpv56SYAJrpLGR97lrpI7w2RqnaQwiOQKIHkVbuRS49kgH7n9kVZZRDkRT1liTOnpEfqpQAXAVAWDZdlXYNLJzFSQkDYizIVDhVQ1SKgbSSjSglnWbClTifDe6zTVDzRXTDScljQTQWSN4kX9QLQ1wrRrT1zsgNmwymyKguyLHSjHSRL8jK6XT/Bhz5rNRhBfRPTFivivSAJ4AfRcSPQCx/QAw45Nm9hepgztmQwCrsUk7pQU7flgW9mYz/lPqZTAWOFXarBEwOwkxXENIBnphBnyp0wPE1IjAsw2DswxSJjczIn8zMVCzNy8xiytIdwyxirKhbbKyqwn6azay6zQK0bRzmL3gokCwWyvhWw2xhC%2B7apOxMVCrxgNxgSOJW6bqZJPBvKvgfbhwcpmLQRxwJxcSIUhIKDWnZy5z5w1JFwlxlwfiVzVz5XJJFWNyBUixGImLWLhoxw0gpr9wfDmhDySZjxCaKrTyzzzyGQyjLz/BrzxAbwIizUjIPLqKPJ16nzMhjKXzXz1U7bSiuVW7vxILpLYmWrxzupvScI6RgKdIQI9IwLShwImiIJgIKmoLoIECYLTXUg4J4LqJEIkJkJiLqjUK0L0KEJhGmn4mfzsLVpcI86PL8KKiCIiLTbiKqj1VOZ5xyLT61nMiqJMinWaJgLaJnH/B6JFbNLGKhVTVS6ZkWJKhBI2JrS0WlXOI01noeJeI%2BJ%2BIwjURBIhKdJhIRKAbMr75hwJJo6CpTGgkZJZIvH5BTl5ILJMT2zRVhBYKVHVFQwgj3FPa9jGDrHvGvDn7vF4DVEQCoZmJA4CGUEkpA4Upa17Q0p0q/gMotjzS%2B0o6xlJVQTcoynNlaWtnE5/gqHSydnCFW0rAohQwwSz5KiO0sDGDi4AgJAe0QAJkVGO3W0Yh4hCw/nb7kEgUxRZ053xYFa3iF3F1HKl3u3Z1QCV2t14C10Di7mN0ihe2U5khD200O1O0DrGCYCTxOglYaCIgwRVTN2eRnpz1F0aqL3L3KjDay1d1KAEBcAH3u1H0fbVTLb8j96VqIiR67iX6CGEGjXjgb2PL5izCdgvDOJ8YkAFj0y/0LA1jziiaXKEk16A0whL2kUxmm7oBfBMAF5WmVxMkWRKxgos6mhDgGAbFOUXIRoDSkSmiknLXLJrWviqozigI36qBhwDr/wJDZr5xYDCG20LX22d3z16qX0GgQBcOfD9aIhd2FqX0r1uKkAoaT1S7wUt3V3Z1Qxl4WS73vEIBj70ACNfHCMPViMdrGBCAAAScgh4h43g2AdppAa9gSpApCqGdGqjrEYg0oQeYCKeWCejPDBdfDAIS9eAgj3jjMiIqjcjPWT4zj6jBmRmmAOjQjdtITFRhjJjZjFjVjKwpA0Q7IpAGFd91IzjqpjSU6W4TAZpXjiTfwIMXdC9ATQTlTOYoTmcCA4TMUPt0R1TTtyAIIw8AIVAgiEAFRXTPTdThoMQ6ACT3DjMYEjjp0cqCgfy%2BFxeYcgT7UjE7TAgGKAgt1bNaKmzL%2BpkacDkUexTXNk1nc4a/RSagxVToxAdkxX2T4Mxf2A2IJMUR%2BkVEqVVNIyAxEZJwN5mtE64Cgfq%2BzNQjyqz2pz4Bot1GswSKe5otk9h7U1AxEj8ouO6iL0org9p6yhRjyCpnjv4z%2B4L7%2BgKHyXyaY0lQZltSjNRNYDTKxvje9LtOVxg4zntW9JyZ4rdNYNV0TrwF96qyFz2wdtKrTCOKlCdfKSd%2Bw2twqGdU9SYldbYREbOvYo4h%2ByYluXu3UDEOYxyA8M4b8dORYCZ80xJY6zIxiWUSrVlWANAccarLOdqUuT4vLbi/0XozrtAlWnBHpEYvpXZ2Or4uO2gtSLI442g/4Pr%2BE0bCMAOSr7rdLudbo%2BdQzRdJdbt5dg9KbI99dhlv57W2wKIGI0bTdjhPWM9X96jWbZd/dFdlZQ9%2BbyoY9j6cWpb5bE9XLHM1bbonTRdtTh9q969bom9lbLu1IjUNVRzNgLOxSA0qsoeWp8yBSerqq/aomM0oQogG7hFhp1ELyLwo6eA46KowiwhDSo8n8O9vDe9eKkjR9MEYjTtZ9F9ATK9N9szoQL4zIpoxee8FpPWMcQo1Vk8HCt06uiy1QMVU4A1xuuYyI/8gSm2EqALmpENvml75IvLME3dA7xg3TmY4HYzCgEAXd/7XOO8j7JWL7LLBArt9bKIntt9E7/muHboxgHwBHRHoz7LZHFHReXOHwNHz7GbztDHbLTHLHErETHr8G%2BHvYeHO8/BYI6tQhSbOHdL0%2BXHvKt4eH3HqianghPbbT8GXrRIXbBHC9wr4j6Y/Hx93dMjCnO80w4n0lDnbiunznxcunrHwb3l8xbrAWMGt4pbsbbH0unr4hz6%2Bh0XOhMkkXRs8XxBnBKhKXORpnDqKXDR6X5nLRWXSYKX4Bd7jtk5CAbxwrxXIYiIOXsB%2BTaWBAnY1grEzTwp9m5cILNIIIqAxpzan8QcTX6Spus4kuPW1X7pzLZXYIFXnn%2BXwxtX83KWhXdXGR0%2BFnSWSX5IwTVT6jNnEzO3jT4nRH6q7LEzq3E92wsz/IPtlL7gjlVmWCCj29NxNzNS6jRHCgfTAzh3mASNx3PTX3xg/Tkg5HTtlHKcInHLv3pAv3/np0xp1osqOmgpFX6q54uV1kekfLu6xAfRrwAxHTiGYrFITI13%2BUq1qyiLMQNgpqco8ioxjNYrYEjERr9sBYhZFpllcYHA0wtAnAqovAngHAWgpAqAnAMMoDKFkQPApAkNIvvP0whUIAkgXAxwgiqoGgKwTokgmv2vgi%2BgnAkgvALAEgGga9wvov4vHAvACgIAa98vWg0wcAsAMAUAbvEASAjOSq5AlA3v3QyAwAUQmTWAIIbMmAUKZcK4QoWswvsvNAgCL%2BlAwQmgvAiszADGnAsv6f1QBYQowQ2gRKWfvA48DmQoJJBYqfMjmAqCwAFMKaxf1fzxwA4gCvznOY1gns7qVfQc4IwIiwsv/s/PbfGcwQnYDGzgWAVfTXaxxf0wwyTA0mkfmA0fsfjf/AggIgYg7AUgMgggigKg6gbfugjQ7uJgZg%2Bg2edvkA0wqANP3YnAQ4Qofhsi8wFxIoXikVgK3E50Q4Pr%2BDqAO/uaB/4X4tYpQAAUAOHCHAwwy8HFp5iHCjRt4vAScrRER7X9PaTQIlEkHsAOxBgDQUgD4D8CdBDahQWIPtXKB4D0g5ApIGMByR6BLAWAgQK0AGAuB6g9A3jJ3yYH9B2gRA8YN0GGDcDKBAg0YLwLoFVQZgcwBYBID54C8heVfa3tsE/5DhAUwAZAOzCiDHAUWuAQgFtCspVReAjvRXqQG0g2hugGA5Xpr0N4cBjepAU3uqGOAABODQJEBRDKhlQKwSQK4IcEuDoglvZAZwFt7285eqfZ3h7y96U8feFARtisiVQoBVMToc3tXzD4LBl%2Bq/RgOvzoAdg7eEAFPm3xz6Z9uAafPwLn3z6F9rAjfUvuXHL5Lgq%2BWAWvvXzt6FCm%2BAaVvqL3wAd8bAXsHvowzTCZhG%2BQ/KvqP3H4FhJ%2BiwUXjP1N6FD5%2BBgJflHxj7pCmhG/YQM8h37SAlhB/NQFX10CZMz%2BIAUwIR0v5uh4At/e/teEf5Ch6CQ4V/u8DDBKDv%2BLAO7P4DDAKBCoBYAwGe24hUAYgScWQM1ytSKgqAtIaEPTAICfYRQsAooMOEQEs4EBg3TsGSBAF/8xyJECAeqmQHkQs8WAdAdMAYGcDPAEABwEIIIEOxaBJAqgZkCSBEj145QUkRMGGAcDygzA2oKwKGCYC8RTI2kfwIsCCCWR%2BAkYDUE5GFAcRUvaQVYMF6kA/BYvTgIoNcHKDRcqmfbMqBpQ0o3w2gqzILBl6TADBIQ6YCYKwDhBzBIASwcPxsGSjregQh3jqKsErATeIAZUASBpQogVgzgjUBoGVCRBIgkgTUBKPkEBDghCvUIfAE94gAIUMQYEL72iHUMAgjkTgHcPlFn99sqoZUaL3%2BA6DMRegJYVv3EC791hSgTYcfxACZMVwnYGIHPzFFyC2%2B1vIUMCDDFPkZRX/eMQGkVHJi3wzgGIbuCsqkJtRAYmQdYNtEGhHBXAAkA4METKhvBbgoMJkzNF%2Bi7elonsaQAsEohjgkQVUJIAcEEgUQK4tcRuO9HD9IgFYq3n6MMHVRrRB4/wTb39FO9pgWzTwJICAA%3D%3D

partouf commented 1 year ago

Some of the challenges when it comes to full urls, are:

I'm sure there are other and better ways to compress and encode url's, but these are the tricky bits to take into account

mattgodbolt commented 1 year ago

Thanks both: agreed it would be nicer to have more compact "full" URLs. I'm sure we can do better, but @partouf covers a lot of the bases. Interestingly, for another project I have started using CompressionStream which might offer the right amount of support for our uses.

partouf commented 1 year ago

Thanks both: agreed it would be nicer to have more compact "full" URLs. I'm sure we can do better, but @partouf covers a lot of the bases. Interestingly, for another project I have started using CompressionStream which might offer the right amount of support for our uses.

Right, I looked at that, but the Firefox version support starts at 113, isn't that a quite recent version?

mattgodbolt commented 1 year ago

Oh! You're not wrong actually. That seems too recent; so we'll have to find something else.

mattgodbolt commented 1 year ago

https://github.com/OneIdentity/zstd-js looks promising. also our base64 stuff is using the = and other chars that aren't safe for URIs...so we can probably switch to one that uses - and _ or whatever... worth a look!

pcordes commented 1 year ago

So is the size not LZ-string's fault, but rather the post-processing of its output, and all the config info that gets concatenated with the source code? Like base64 and then replace some single characters with 3 characters, defeating some of the compression?

If LZ-string is using DEFLATE (aka gzip aka LZ77) with an encoder that's even half-way reasonable in terms of actually finding redundancy, dropping in another compression library won't fix the problem.

But if it's making a valid DEFLATE stream while not actually finding nearly as much redundancy as command-line gzip -1 / zlib, then that's a problem, and yeah a different library could do better. (I'd assume it's a challenge to write an efficient compressor in JS, since I think you'd want unaligned byte access for more efficiently checking for matches in the compression dictionary in chunks of 4 bytes or something. Lossless compression is a CPU time vs. compression ratio problem, with better-optimized algorithms giving a better tradeoff. Is there a tuning knob, or did LZ-string just sacrifice a lot of efficiency for ease of implementation and/or speed?)

(Also doing character substitutions on the base64 string to map to a URI-safe character set without %xx expansions for URL-encoding is a very good idea.)