bbc / wraith

Wraith — A responsive screenshot comparison tool
http://bbc-news.github.io/wraith/
Apache License 2.0
4.84k stars 356 forks source link

File name too long @ dir_s_mkdir - Max file name and Transliterate UTF-8 characters to ASCII - Babosa #354

Closed Natshah closed 7 years ago

Natshah commented 8 years ago

Hi, I'm having this issue which I do use the spider options.

no paths defined in config, crawling from site root creating new spider file

/usr/local/lib/ruby/2.1.0/fileutils.rb:250:in `mkdir': File name too long @ dir_s_mkdir - shots_nojs/thumbnails/ar__content%d8%b4%d8%b1%d9%83%d8%a9-agip-%d9%87%d9%8a-%d9%85%d8%b3%d8%ac%d9%84-%d9%85%d8%b9%d8%aa%d9%85%d8%af-%d8%ac%d8%af%d9%8a%d8%af-%d9%84%d8%af%d9%8a%d9%86%d8%a7-%d9%84%d9%84%d9%86%d8%b7%d8%a7%d9%82%d8%a7%d8%aa-%d8%a7%d9%84%d9%82%d8%b7%d8%b1%d9%8a%d8%a9-comqa-%d9%88-netqa-%d9%88-nameqa-%d9%88-qa-%d9%88-%d9%82%d8%b7%d8%b1

We do need to Validate and limit the max length of the name and have a Transliterate UTF-8 characters to ASCII before Creating of files and directories

This sample code could help to get the case when we run the spider.

#!/usr/bin/env ruby
require 'thor'
require "babosa"

class DirWorker < Thor
  include Thor::Actions

  def self.source_root
    File.expand_path("../../../", __FILE__)
  end

  desc "create NAME", "To create a directory"
  def create_directory(name)
    FileUtils.mkdir_p(name.to_slug.transliterate.truncate_bytes(100).to_s)
  end
end
DirWorker.start(ARGV)

If we run this command: ruby DirWorker.rb create "ar__content%d8%b4%d8%b1%d9%83%d8%a9-agip-%d9%87%d9%8a-%d9%85%d8%b3%d8%ac%d9%84-%d9%85%d8%b9%d8%aa%d9%85%d8%af-%d8%ac%d8%af%d9%8a%d8%af-%d9%84%d8%af%d9%8a%d9%86%d8%a7-%d9%84%d9%84%d9%86%d8%b7%d8%a7%d9%82%d8%a7%d8%aa-%d8%a7%d9%84%d9%82%d8%b7%d8%b1%d9%8a%d8%a9-comqa-%d9%88-netqa-%d9%88-nameqa-%d9%88-qa-%d9%88-%d9%82%d8%b7%d8%b1"

This will work .. But if we take out truncate_bytes

Thank you.

Natshah commented 8 years ago

My solution for this, we need to add our Validation for the Wraith::FolderManager to transliterate truncate_bytes. We need to have the transliterate for babosa:

# encoding: utf-8
module Babosa
  module Transliterator
    class Arabic < Base
      APPROXIMATIONS = {

      }

      def transliterate(string)
        super.gsub(/(c)z([ieyj])/) { "#{$1}#{$2}" }
      end
    end
  end
end

And go and Fork Wraith, then add some custom config functions and load the Arabic transliterate truncate_bytes for where we create or use files and directories.

ChrisBAshton commented 7 years ago

Spidering has been overhauled in Wraith v4. Please re-open if this is still an issue.