gitlabhq / grit

Grit gives you object oriented read/write access to Git repositories via Ruby. Patched for GitLab
gitlab.org
MIT License
59 stars 55 forks source link

Grit no longer handles binary files #41

Closed mpalmer closed 10 years ago

mpalmer commented 10 years ago

b25f7451 completely screwed up extraction of binary files -- I've got JPEGs that are completely mangled by this desire to "Ensure that the same encoding is returned in all cases". How is that even a good thing? My recommendation would be to completely revert that commit, it makes no sense whatsoever unless you only intend on supporting UTF-8 text (in which case, it would be best to raise an exception if you get binary data, rather than just blindly mangle it).

maxlazio commented 10 years ago

Hi,

The reason this is implemented this way is because we have to cover wide range of issues that non-latin characters bring into binary information.

Can you give me some examples of where and how is this broken? Can you supply the test repository for this? If you don't feel comfortable supplying a public repository you can email me with one. The repository would be used for testing purposes only within GitLab BV as I am an employee there.

mpalmer commented 10 years ago

If there are issues that non-latin characters bring into binary information, why not just leave the string of binary information encoded in a format that transparently handles that? That's what BINARY is for, and it was what grit was doing quite happily up until the commit I previously mentioned, which takes a string which was correctly encoded (and which you even used CharlockHolmes to detect should have been binary-encoded) and then lossily(!!!) encode it into UTF-8.

https://github.com/mpalmer/is-grit-fucked is a minimal repo I just assembled demonstrating the problem. When the enclosed test script is run as ruby is-grit-fucked.rb 2.6.5 it will report "Grit is good", because Grit::Blob#data returns a byte-for-byte identical result to that of the file itself. When run as ruby is-grit-fucked.rb 2.6.6, it will report "Grit is fucked", because, well, it is.

maxlazio commented 10 years ago

In any case, this issue is fixed with the newest version of grit https://rubygems.org/gems/gitlab-grit/versions/2.6.7