Closed jrafanie closed 10 years ago
cc @Fryguy @tenderlove @roliveri @jerryk55 @gmcculloug
@roliveri I'd love to play with your camcorder tests for the various scanning layers on ruby 2.0.
Are these generated SOAP stubs?
@tenderlove None of these look like they are related to SOAP. The binary string literals in lib/disk, lib/VolumeManager, lib/fs, lib/metadata, lib/test are all part of the ManageIQ VM introspection (fleecing) code.
The VM fleecing code should be pulled out into its own gem (or multiple gems), and Rich will be presenting about that at our upcoming conference.
MiqLvm FMTT_MAGIC is ok
irb(main):009:0> string = "\040\114\126\115\062\040\170\133\065\101\045\162\060\116\052\076"
irb(main):010:0> string.force_encoding("ASCII-8BIT") == string.force_encoding("UTF-8")
=> true
Deleting of UTF-8 strings from binary strings works as it did previously:
irb(main):018:0> string = "\000\000\000\000"
=> "\u0000\u0000\u0000\u0000"
irb(main):019:0> string.force_encoding("ASCII-8BIT").delete!(("\000").force_encoding("UTF-8"))
=> ""
More ok, instances of binary string literals in Win32Accounts.rb:
irb(main):023:0> string = "D\000\002\000"
=> "D\u0000\u0002\u0000"
irb(main):024:0> string.force_encoding("ASCII-8BIT") == string.force_encoding("UTF-8")
=> true
irb(main):025:0> string = "\004\000\002\000"
=> "\u0004\u0000\u0002\u0000"
irb(main):026:0> string.force_encoding("ASCII-8BIT") == string.force_encoding("UTF-8")
=> true
More ok strings from miqpassword spec and uuid spec:
irb(main):031:0> string = "\001#Eg\211\253\315\357\253\315\357\001#Eg\211"
=> "\u0001#Eg\x89\xAB\xCD\xEF\xAB\xCD\xEF\u0001#Eg\x89"
irb(main):032:0> string.force_encoding("ASCII-8BIT") == string.force_encoding("UTF-8")
=> true
irb(main):027:0> string = "\343\201\223\343\201\253\343\201\241\343\202\217"
=> "こにちわ"
irb(main):028:0> string.force_encoding("ASCII-8BIT") == string.force_encoding("UTF-8")
=> true
irb(main):029:0> string = "\343\201\223\343\201\253\343\201\241\343\202\217"^C
irb(main):029:0> string = "\345\257\206\347\240\201"
=> "密码"
git grep -l -E "\\\[xu]?[0-9][0-9]?[0-9]?[0-9]?" ../**/*.rb | xargs grep -l "encoding: US-ASCII" > binary_strings_with_magic_comment.txt
#encoding
magic comment:diff binary_strings_with_magic_comment.txt binary_strings.txt |grep -E "^\+.+rb"
+lib/VMwareWebService/MiqVimInventory.rb
+lib/VdiCitrix/VdiCitrixInventory.rb
+lib/VdiVmware/VdiVmwareInventory.rb
+lib/metadata/linux/LinuxUsers.rb
+lib/spec/util/extensions/miq-erb_for_yaml_spec.rb
+lib/spec/util/miq-password_spec.rb
+lib/spec/util/miq-unicode_spec.rb
+lib/spec/util/miq-uuid_spec.rb
+lib/util/extensions/miq-erb_for_yaml.rb
+lib/util/miq-soap4r.rb
+lib/util/miq-uuid.rb
+lib/util/win32/miq-powershell.rb
+lib/util/xml/miq_rexml.rb
+vmdb/config/initializers/inflections.rb
+vmdb/lib/pdf_generator.rb
+vmdb/spec/migrations/20121102204300_change_binary_blob_and_binary_blob_part_size_values_from_character_length_to_bytesize_spec.rb
+vmdb/spec/models/binary_blob_part_spec.rb
+vmdb/spec/models/binary_blob_spec.rb
+vmdb/spec/models/ems_refresh/refreshers/scvmm_refresher_spec.rb
fixed by #804 #732 #715
Ruby 2.0 will convert strings in ruby scripts to UTF8... we've had issues such as #218 #465 where binary string literals were converted to UTF8 as invalid encodings and caused tests to fail.
We still have some potential binary string literals buried in ruby scripts that might not be exposed directly in failed unit tests... Are there other grep patterns I should try to verify each of these?
See below: [EDIT] Used better regular expression to get more binary string literal possibilities
Another regexp usable in editors (where you don't have to shell escape things)
\\[xu]?\d\d?\d?\d?
List all of the binary string literal possibilities in ruby scripts:
git grep -En "\\\[xu]?[0-9][0-9]?[0-9]?[0-9]?" **/*.rb