aklomp / base64

Fast Base64 stream encoder/decoder in C99, with SIMD acceleration
BSD 2-Clause "Simplified" License
866 stars 162 forks source link

Invalid base64 generation on ARM: extra padding in the middle of the stream! #69

Open danergo opened 4 years ago

danergo commented 4 years ago

I'm using this library on an ARM embedded system.

I have a static function which is called from multiple threads in the same time. I'm using boost's mutex to make the important parts atomic:

char Base64Buffer[5500];

// buffer_size is max 4096
int writeOutput(uint8_t *buffer, int buffer_size) {
  mutex.lock();
  base64_encode((const char*)buffer, buffer_size, Base64Buffer, &Base64Length, 0);
  ...
  mutex.unlock();
  return 0;
}

It is working, however sometimes when writeOutput is called from different threads at the same time, base64_encode creates an invalid stream.

Here is a sample invalid base64 (unfortunately I don't have the raw unencoded data): idqI4bA2ArMAfIkw0a5wmcSAxEpoMJTwOX/aR1BVTTZVRO+Hx4NT4OiOtvrhS8+W2LvbDO0IVh220MGMlZOG2x6MVJeIjeB4fkkNwnwiJ8tOwacCeJhHRCBZFKXvhU33fUw1lWH2dxw2sr9oihvNH3ot3PWrrl/KTC7wCatBcIHtMT6/cuH7dC5EgdwSYdKW4ibqMcjH9hGwT9nPZ92JUbIjmBrzmJI8aoMi8KEpEU61sbuugfxFFwfEtmVO7S5hwYjkormiBzTaDOSTpPi+O5TuTwKUXDb0bebTBVvL8JSicHsX7GVHPQdnAJgWSZ9uT1YU/MsrJYs/srGbvMw30sqGpbJ4/sNAcoVkpb8TFXJcRJZAl2jejm5qbY1TYzP6nz8y7LMMUnUdOwEKIdYu8Kub/Gb6jkxMQULsxc8UjNpS4EkDwYW6/Sm+LE6FIXwSrcIHwM93fBitXDJrbVEN0RYhGl1EB53QXlrUwabERn91tTq3J5i/90AF/KcJwpHCq7Lyrp8JeXuYnON4A84ygEO0p94JNgu1h/CO4iULOuw8GBCpZxFmbHkSIW7ihfPFRfHEV6+8BEoxhjDt7lXaFmjij07RfDJlGri+AhxgFLqCGGFNPKsZYUNfDYm0RaCMdjtDZOAgn+GeIfUpE9zR0y1lVrQVvYMi8X3GTchagsgCW73r/Zmff7qNY2AQsSTKKN+D6auzHvFhlAV0Q2pVd9DayVg9IUAJuslO0+XnrlPpAgIyOjio9ShPjk9mLS9uLyOSN4UsIAeqxWl7SoaXop79+t0iorBvbfQA94hO4Ly/tOkQqNY4Z71YAG2AOFs+VCorRcLvY+AchsGa2XPWJmtQv/9K0EyrJMZHAqwF04sRKY9GuFXpPV4ccLq2NUO0Ct/5chTUO+r0nDc059sZjvsW+4eLR2nDKq9+E9niTeSODPqb+lU3o992dExJPI5uqmzo5ntfUXPM8EvwCarbk/XdN1xtb30XqN7F8ENNFFT2HwQsshO47JgFETlrImdpfjXcpocCGyEoe+fqvIPIHdKpr9BvfAA1FczsDNIc7lIkk5vWsz8TpfRczTVzuXTRLACaOzRZOAAaeZ45ZMNtrP8Rm6gEsAcs+mxxHOidR9F4HiGAXh3be14xMhiUkMS9sYqn404H5SeLxg8HmQbyfTcORQWnB4AimvVWfi2MTbRGAkSiULdUGZKy65dp1axesRABHf46jF+YmCyysIg303IRFHI+oWRPwFoIhHTuQmBkAuuYypJyQwbzeb80XX/KdDyfIsfaNLVzgQoXFWozwqmlk/W4ept/7V6eebGkChc1P2sUZCfczk3uiiKSBx6L6eVvjtUsKpNq0vQAAAa+QeGI/ymmOd5c0K419SeXSexFWbJJi4etmdyBS6FsnE2ssGX5J+gw4Wc0YIlSkoXqGIAViAuz6cGx/nRp/ZuT9kqT88+Fc9xe0oabx32LFCwbaBVqHrmPqTDXE2BXkaCb1y+gX7XfvqRAIB7/5uuBM9sASdk2mNPa0r7h9pNyEzSsfvSda2fJEROOG+cGQ14nHRx+5XXANnuqnbbuY9g7k7BRqYrpapTVafKL9+DP4S86TVlZMUugCsHwHWM2VtBZgx2YvoOgrCtFnnUrdKUJoHMUuB/wa6SFV5BuvAnnlQ9l6PClDPN3NqRYECt6w7/wW0UJRhPWmkC07mhTvwyM4AiG/dR5DhUktBPZONHvwVKcWkRvM+i+Srr8UBQtvCQ5lUFDj/+OevmoYmkr9wwchEqWEk4EuO3mZZe3KoKW5dmIUNbx0WbwxXBSu5RW7YJPR9G2jxac9ZeHb9rW40NbPcKdZ9qBTmxKKwJZfJof997jCZRVFuKRpB/OBHbvTlAGfflJY0335As3eFROyJ3SQtzSNiT7h8offMnoH1d0EE01EyMwzvZFI/3Wk0JYP6T7ZEV+wGialipfRNkPoU0nmxewc9AN1LLFrEK2YtNr+qvZNhHWDlanBQFj6frKEvGvBTOSXeWRKyDnc37hNCUCpkda76GigNGeAAyLroj0xjzileWwjoHkWS//TiWQAS8lxW43tj+40eVx5M8mGQ0kvrZ3wHl7tToXha5Xw2he7EXfBVvo73DmXteH+0xJtEhNAgSClSKk8x70mfXMFYWZU0mXpU/CZZboYAgfAYdUq54lopUT4+YppDBZhGe9P+/1LeEujQ5xnO1rVsK23UUleJFePszCvQjvbuh6uawCouT3MTtvJFfQTt5WIkvhtr4Vv6tbQMMGBjH8J9RKjNJXsD1xnZ/6bSLD4Sqa0NSDzOcvsG3vmnL9F7a8U/pjG0barDAl7rYciN3jZWtTS09usUvMJKAZw/BF/jZN23QyF3oZ/v7/sZXzuvFYltwsMgqxrjZfVzwlAMwQ9d6ZI4Itg+Gf+sMxGHa4qmzn3/esNCcybwEWUAFzzRb5/RxJwGHKoUYz8QLApexP2AmKHng3hotDnyN0QbVab1oK2lXj3aRjsNP0QF/7lRhwcanrxNo14dPY0Lgqm+mEicezUO5RjpWUMWNtaqPbN7dMMewT+l9bTW6fpMx7UHh+H5cm+HRRxjBEn51XGSnyrm0AOpu5Qdh8Fe+ZCWx6dnJGsYPRXd7av789Mu2zOqYHqILU7Vl9vgagyPHkJCcWG+Z+FIuoTdLKrsaHReoMRileqqlt3PH9ko3Kz+r4ztDJDD6kZAbqlBKO+96I1AGbA+/QZ+ILZB2K0d7uFlnvbIUuCjTjTNdE9CQIYWBT7DthshNIoJdHp2mrOLzV2Hb7fJhdlHtq/HfxB1mESwQWnSAJluCqa0qsqjyADLz3ln6SavFYIfN6pSqzWzlt78CURh1p2EnRf+jQ9ZgM4fByIIEO2dgYo01wo60cExp9R+Fp8Z6CHXuq5jnkPeI+RSRsjrDJRvCC1LcLdrgmwfG1PARqWYlzmeKmWwztX74RqJVV+OQ0mIaWqti360cIBc5SJKyiP0YFVX5tYzI9HG1wquO5vgfH6kP9KYXgocQiD0LQR4tQp98v++I0Fq/sgj2mW5gHnXTMq1ijXr3rUyLqYbMvOTC4yvh08UQDHwDrEhJGCVlL1MKi48NIHtfF8+EhVqib7++6ZCttyc6uH5Hqkd3FVmJ25x6Gh0EFENsT51bJ8NK/xXtUSy5i8tXZgBEI/jpQcb7s2mPr6B3fRXhsRi7rs5bPCs3kMQynTMrRHT0jV0Uvm6XlrVA4TtxS43MLMNT8ItYIJya9gZLbfZE77qazpfiaVTujUOWMITsSi5sV80kZ0LELGGOY76b/M82v5zDCE5ss7rcFUqeUca8LWvzB8dn2VsOV9k2iMc2XIb2/MFffTsc9UEDo66+ZdBLEwLPXJ+OBt6yCnEE1xwWV8jCfbx2TLaT+9urSBb38lYfB0eyV5Ex/Y3jF58TqprSM8xbRL0XlwRp8F92KMMaoNGHCoTs+oDdDZ4PRcTdIRAZXaBCLOd2VT1gBIxiVc/gUPcU0uLaa743vI1u4OdJMp17hq00DrTpaWr+iFZjwkfBCBCZfB5C6K73Lo5WVUGNurZbvC4PagULv+FXFOKDbAOba/COlw1OR1ZK31c0TChWepTi9EuvD6KEgid+vTYxD2INfF08RLbcB7V5eDBdNsTXrVS7KFL33Ik6kKMacrPQy/SjJdg==Xgf5ecvEiJ8duJAD+GUKzDw3DbwfuHL7zLVQzKkLDA46/NhORitq1nzjndLtx0EEkTKMbOXEDi27t/DZzsEyeAOWt36bJ46KwDDFOpn3dT2KThXw130NiQ3TbBeMUMwR0obhUycp1A5qzFn1RknQm7f/e4kPyTuhVgDB8czuDb/Lok9y0vo9AE4TiA/4jgJ6S+OmjRcB/h/529iaAFf+lpf50QBTyqg1Pyy6inZ6jbdCn+aX2tU7hI6hp0G3IrCa4C5swRS72QiIDNne3zrzpIyHHW6e9yqT5GZAxEPeSaGs0lSb2j/1e6Z4dJwOW9rBkgO3IzvZC8tK4iYHC2alp+no76vFu3RCT1ktH3stq31EDR+7ulkVrjej9EobFuPu9vh+90WnkqI2Xtk0O/roBhSk9vROeGWFJmDP6ZVpPg3RpUWx2P4vvPQ8xyrCFubf4cGP76QmJ8TVQ/oWDBlngBsI1SwWdYuGtP+P7hVYUA/Rh3vdWkFJND4Wur7TUNQ1buhKaWa1K80pKmSEvBRDPvz7MctP6w+TP0ZZpmCA2m7kOiRvJYvPpjL8h121Sf7s33pHIE3c0936xXb0t72wJVLWAWAMzlkBxS0IcRQVEV0m4Z1HCX3tXVv+bVZ/XqRBEz78seIGW7yNkZmc3GbrZnuVqmzma5SYXybHkGlyxYo5aSP+zDpHIHkNsPxYiuQE1SeBMaiuzKU4FoggvEoHlC0mCwgvKR+2OKU/k7zf3GwoDwii4kz/9fDSEsiEuputtFijy3/Na0Qx0R0mWgwi5q1tApKVLpCvLn1WR55FeKZnVtVShtuued9WRebV6J8PR+YD4BJ2hmKN1EY9iJOuNJ14QG/IdiwtGcvP89r279OiTC3Stb4gz75MzZpbbO0/hQdg5YW8cF/w7YyCJ4N5p53a55dAnHv0SANXVTD+DQfBrpfK6rOSEhdOjE7y+cz7DvKCrfJkTkevA8GkIjoeffWVqIi2tXB+gJ3uhDMga9fKrv/M6M8qLFB7OGeimtVseZKAU/DZIB3fX0Ji20TWWG9+pZe9IAZllegjVo/mK10fwhnTfugg+GvcYVc+WjrYLQSYe6fkgPmlLLrkvhECUfkx7SWtZo1runArW6uM1vQDRFyI4LpeOCpOPSDcj2CVLUXLpJU44wmP/PUPIvOw2zCfkD+qKexnvs92k4bibBzn5N5pumiuV5TAbDCed4u9Oh5urO57eacaOvUp50oxYWoF2JoC3/IL0uci1wqN9S3X4B/a8Z6jCjCgX98REihYMVaemApD7KV6eHT7OIVCBQaZzk4DEjRcy1g/nVJiSHYlbLESnIZyojW8q7ZnCH5stnsaYp1L1V4R1FYJJPqfGEgu+GQq9AWP9NQyDUlesbriOl/kXII5kaeRYj7mzo6MkIdh9m/eBjL0yCUodo/C6Sk9XsTN97Mjkhme+nQutEH00plVLXecBRP28257ybo+sjvZXtl+kG6Ew2/hskGhjpfWC0xg9YBXQE1u2Dr2uswm0uBBfOjSEimmrtIHTjWKmDxCBEQIJo+5i07MqAAABKtB4YD/KYke+rrUl0r9qozDpbbRNTI5mnydC/Nr3aj9y9QPNqLrU5B4GrXmA52IQVQQ0jC2B4dMcmKDZ5qlvDMQCg1WCC3cjX0u8tgqLdB0sKknj8Muo+SIPGBXKm8h7NBq6PHTk/Ir/sANPZxfyx83OaB+IZCsXnGOwg7iO7CeuOQ7T0h3CtBROJDq+J2jkqlqrS19R5SnUEJr6sP6oWf2xo3AldD2fyHwBTQQ6/bZyw==

Could you help me with hunting this?

How can multithread affect base64 generation if it's in an atomic mutex block?

danergo commented 4 years ago

The problem is that base64_encode put padding in the middle of the stream ('==').

aklomp commented 4 years ago

Hey, thanks for the report. I'm a bit baffled by this, because like you say, if the library is run inside a mutex, it should be threadsafe. In fact, it should be fairly threadsafe to start with, because the only shared mutable datastructure it uses is the codec selection structure, which is written to once on the first call. All the other functions are reentrant and do not share common data. So that can't really explain the bug.

The padding is written by base64_stream_encode_final, just before base64_encode returns. So I don't see a way in which it can insert those characters into the middle of a stream. What kind of IO is writeOutput doing? It could be that the IO is not properly flushed between threads, causing the output to be interleaved.

danergo commented 4 years ago

Before unlocking the mutex, writeOutput adds header and trailer to the base64 data, and sends out through a socket. Now if there were incorrect flushing between threads, it shall be "all-or-nothing" fashion because of the mutex.

I'm not seeing this: HEADER#(BASE64_1)#TRAILER HEADER#(BASE64_1)(BASE64_2)#TRAILER

But I'm seeing this: HEADER#(BASE64_1)#TRAILER HEADER#(BASE64_2)(BASE64_3)#TRAILER

This can be easily worked around by splitting the stream at the decoder (or if there is no padding in the middle no need to split at all).

My only concern here is buffer size: I've optimized Base64Buffer to fit 4096 bytes. Obviously it wont be able handle more data and could cause a segfault.