carrierwaveuploader / carrierwave

Classier solution for file uploads for Rails, Sinatra and other Ruby web frameworks
https://github.com/carrierwaveuploader/carrierwave
8.78k stars 1.66k forks source link

File returns a string that is encoded in ASCII when I upload a string encoded by UTF8 #1583

Open elijahkim opened 9 years ago

elijahkim commented 9 years ago

I'm uploading a file with CarrierWave with a UTF8 string. However, when I try to read it, I'm getting back a string encoded in ASCII.

Here is the file I'm uploading

class FakeFileIO < StringIO
  attr_reader :original_filename
  attr_reader :path

  def initialize(filename, content)
    super(content)
    @original_filename = File.basename(filename)
    @path = File.path(filename)
  end
end

like this

content = "My UTF8 String"
my_model.file = FakeFileIO.new("file", content)
my_model.save

my_model.file.read.encoding # ASCII
bensie commented 9 years ago

@elijahkim What version of Ruby?

elijahkim commented 9 years ago

@bensie hey! I'm on 2.2.0

reconbot commented 9 years ago

Would the FakeFileIO have to respond to something to specify the encoding?

jeansimon commented 8 years ago

running into something similar, although not exactly the same. uploading a utf-8 file to s3 is ascii-8bit when i pull it down...

RavWar commented 8 years ago

I've also encountered this problem. It turns out it's caused by binary read (rb flag) here: https://github.com/carrierwaveuploader/carrierwave/blob/master/lib/carrierwave/sanitized_file.rb#L163

Not sure why is that. It was written back in 2011 for a better Windows support maybe?

stanley90 commented 8 years ago

Same problem, and couldn't find a workaround.

stephankaag commented 7 years ago

Same issue overhere. Anyone with a workaround?

RavWar commented 7 years ago

Well, i've just removed binary read flag: https://github.com/RavWar/carrierwave/commit/b89fc07147db0519c1fdae6d5785269da5471bb9 Haven't encountered any issues after this change

dlynam commented 7 years ago

I'm experiencing this issue with Carrierwave version 1.0.0.beta, fog-aws, and Ruby 2.2 even when I remove the binary flag in the read method referenced above by @RavWar

dlynam commented 7 years ago

I was able to fix the issue by adding a Content-Type header in my uploader (I'm using fog-aws and this sets the Amazon S3 meta data to use utf-8 as the content type):

class ImportUploader < CarrierWave::Uploader::Base
  def fog_attributes
    {'Content-Type' => 'text/html; charset=utf-8'}
  end
end
wanabewired commented 7 years ago

Did anyone get this working with local filesystem storage?

joaodiogocosta commented 7 years ago

Had the same issue. I guess carrierwave has no way of knowing how to read the file using the expected encoding. Can be a text file or a raw video file for instance, so it must be read by default using the binary flag.

The cleanest way I could come up with was overriding the read method on my uploader:

class AttachmentUploader < CarrierWave::Uplader::Base
  ...

  def read
    # file.file will give you the path to the file
    File.read file.file
  end
end

then

my_model.attachment.read # => string using the correct encoding
chikamichi commented 6 years ago

@wanabewired and all, tonight I was facing the same issue, working with local files; but I guess local or remote doesn't make a difference as far as the issue (and related fix) is concerned, for it's mainly about the encoding of the incoming data being wrong. In my case, here's what I've done to fix it:

# In my controller, which is handling "items" with multiple
# "item_images" (CarrierWave-handled) per item:
def item_params
  prms = params.require(:item).permit(
    :foo,
    :bar,
    {item_images: []}
  )
  # My uploaded images have been encoded in standard ISO, but
  # the uploader/form-data layer transmits and exposes them as
  # UTF-8 content, a misleading hint which is then passed on to
  # the Rails stack. Now is a good time to cross those t's. YMMV.
  prms[:item_images].first.tempfile.read.force_encoding('ISO-8859-1')
  prms
end

From there CarrierWave is able to pick it up and work as expected. The encoding forcing could be automated using a charset/encoding detector, say uchardet.

mltsy commented 4 years ago

If you can't add the header on upload as @dlynam suggested, but you do know the source encoding upon reading, you can also just use .force_encoding("UTF-8") (for instance) to force ruby to interpret the data with the encoding you know it was written in:

content = model.file.read.force_encoding("UTF-8")
blackerhand commented 1 month ago
params.file.filename.force_encoding("UTF-8")