cucumber / cucumber-ruby

Cucumber for Ruby. It's amazing!
https://cucumber.io
MIT License
5.18k stars 1.12k forks source link

invalid byte sequence in UTF-8 due to special character in step definition #1300

Closed patrick-silvera closed 4 years ago

patrick-silvera commented 6 years ago

Summary

Special characters in step definition like french accuentated characters prevent the HTML report from displaying the snippet:

invalid byte sequence in UTF-8 (ArgumentError)

-1# Couldn't get snippet for 

Expected Behavior

If a language is supported in Gherkin .feature file, it should also be supported in step definition (.rb file).

Current Behavior

The character 'é' is not handled properly in HTML report code snippet and breaks the report (no embedded screenshot, broken report on jenkins) report

Steps to Reproduce (for bugs)

cucumber --init

Create a test.feature file in the features folder

# language: fr
Fonctionnalité: Test

  Scénario: Issue with special character
    * é

Create a test.rb file in the features/step_definitions folder

Given /e/ do
    raise 'error'
end

Run cucumber

cucumber --format html --out report.html

Open report.html

Context & Motivation

I'm using fr language so any of my member of the team is able to understand the steps. A workaround for this would be really appreciated while a fix is not released.

Your Environment

ruby --version ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-linux]

gem list LOCAL GEMS

activemodel (5.1.5) activerecord (5.1.5) activesupport (5.1.5) addressable (2.5.2) anemone (0.7.2) archive-zip (0.11.0) arel (8.0.0) ast (2.4.0) autoparse (0.3.3) backports (3.11.1) bigdecimal (default: 1.3.4) builder (3.2.3) bundler (1.16.2) bundler-unload (1.0.2) childprocess (0.8.0) chromedriver-helper (1.2.0) cmath (default: 1.0.0) concurrent-ruby (1.0.5) csv (default: 1.0.0) cucumber (3.1.0) cucumber-core (3.1.0) cucumber-expressions (5.0.13) cucumber-tag_expressions (1.1.1) cucumber-wire (0.0.1) data_magic (1.2) date (default: 1.0.0) dbm (default: 1.0.0) debase (0.2.2) debase-ruby_core_source (0.10.3) declarative (0.0.10) declarative-option (0.1.0) did_you_mean (1.2.0) diff-lcs (1.3) etc (default: 1.0.0) executable-hooks (1.4.2) extlib (0.9.16) faker (1.8.7) faraday (0.15.2) fcntl (default: 1.0.0) ffi (1.9.23) fiddle (default: 1.0.0) fileutils (default: 1.0.2) gdbm (default: 2.0.0) gem-wrappers (1.3.2) gherkin (5.0.0) google-api-client (0.21.2) googleauth (0.6.2) headless (2.3.1) httpclient (2.8.3) i18n (0.9.5) image_size (2.0.0, 1.5.0) io-console (default: 0.4.6) io-like (0.3.0) ipaddr (default: 1.2.0) json (default: 2.1.0) jwt (2.1.0) launchy (2.4.3) little-plugger (1.1.4) log4r (1.1.10) logging (2.2.2) memoist (0.16.0) mime-types (3.1) mime-types-data (3.2016.0521) mini_magick (4.8.0) mini_portile2 (2.3.0) minitest (5.11.3, 5.10.3) multi_json (1.13.1) multi_test (0.1.2) multipart-post (2.0.0) mysql2 (0.4.10) net-telnet (0.1.1) nokogiri (1.8.2) openssl (default: 2.1.0) os (0.9.6) page-object (2.2.4) page_navigation (0.10) parallel (1.12.1) parser (2.5.1.0) power_assert (1.1.1) powerpack (0.1.1) psych (default: 3.0.2) public_suffix (3.0.2) rainbow (3.0.0) rake (12.3.0) rdoc (default: 6.0.1) representable (3.0.4) require_all (1.5.0) retriable (3.1.1, 1.4.1) robotex (1.0.0) rspec (3.7.0) rspec-core (3.7.1) rspec-expectations (3.7.0) rspec-mocks (3.7.0) rspec-support (3.7.1) rubocop (0.56.0) ruby-debug-ide (0.6.1) ruby-progressbar (1.9.0) rubygems-bundler (1.4.4) rubyzip (1.2.1) rvm (1.11.3.9) scanf (default: 1.0.0) sdbm (default: 1.0.0) selenium-webdriver (3.10.0) signet (0.8.1) stringio (default: 0.0.1) strscan (default: 1.0.0) syntax (1.2.2) test-unit (3.2.7) thor (0.20.0) thread_safe (0.3.6) tzinfo (1.2.5) uber (0.1.0) unicode-display_width (1.3.2) watir (6.10.3) watir-scroll (0.3.0) webrick (default: 1.4.2) wraith (4.2.1) xmlrpc (0.3.0) yml_reader (0.7) zlib (default: 1.0.0)

aslakhellesoy commented 6 years ago

I bet your files are encoded with Latin (ISO-8859-1), and not UTF-8.

Cucumber only understands UTF-8.

Can you confirm?

patrick-silvera commented 6 years ago

I don't think so, Rubymine displays UTF-8. Also checked with command line

file -i features/step_definitions/misc_steps.rb 
features/step_definitions/misc_steps.rb: text/plain; charset=utf-8
aslakhellesoy commented 6 years ago

Mea culpa, I see the same error. Something must be not specifying encoding when writing the HTML report...

aslakhellesoy commented 6 years ago

Or maybe it's the code that tries to read the source code snippet and embed it in the HTML - that's more likely.

patrick-silvera commented 6 years ago

Hi again, did you get some time to look at this issue ?

aslakhellesoy commented 6 years ago

No, did you? :-)

aslakhellesoy commented 6 years ago

I won’t prioritise this, but feel free to submit a pull request with a fix.

patrick-silvera commented 6 years ago

Sure I could try. I never looked at cucumber sources, any idea where I should start investigating ? Which files ?

sjakobi commented 6 years ago

I'd like to take care of this during this Hacktoberfest!

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs.

stale[bot] commented 5 years ago

This issue has been automatically closed because of inactivity. You can support the Cucumber core team on opencollective.

sarzamas commented 5 years ago

have same error message on MAC sporadically... one fail in report (first error) out of 3 same error occurrences

The information is not lost - error comes up as just replication of same snippet!!! But looks a bit ugly as if I lost smth :)

image

when I run tests without html report following is the console output: image

macOS 10.13.6 High Sierra language: default (en) i use Default Gherkin predicates in en but write instructions on my native lang - that works fine except html report

cucumber --version 
3.1.2
ruby --version 
ruby 2.6.0p0 (2018-12-25 revision 66547) [x86_64-darwin17]
gem list
*** LOCAL GEMS ***
activesupport (4.2.11.1)
addressable (2.6.0, 2.3.8)
atomos (0.1.3)
awesome_print (1.8.0)
backports (3.14.0, 3.13.0)
bigdecimal (default: 1.4.1)
builder (3.2.3)
bundler (2.0.1, default: 1.17.2)
bundler-unload (1.0.2)
calabash-android (0.9.9, 0.9.8)
calabash-cucumber (0.21.10)
CFPropertyList (3.0.0)
claide (1.0.2)
clipboard (1.3.3)
cmath (default: 1.0.0)
cocoapods (1.6.1)
cocoapods-core (1.6.1)
cocoapods-deintegrate (1.0.4)
cocoapods-downloader (1.2.2)
cocoapods-plugins (1.0.0)
cocoapods-search (1.0.0)
cocoapods-stats (1.1.0)
cocoapods-trunk (1.3.1)
cocoapods-try (1.1.0)
colored2 (3.1.2)
command_runner_ng (0.1.4)
concurrent-ruby (1.1.5)
csv (default: 3.0.2)
cucumber (3.1.2)
cucumber-core (3.2.1)
cucumber-expressions (6.0.1)
cucumber-tag_expressions (1.1.1)
cucumber-wire (0.0.1)
date (default: 1.0.0)
dbm (default: 1.0.0)
did_you_mean (1.3.0)
diff-lcs (1.3)
domain_name (0.5.20180417)
e2mmap (default: 0.1.0)
edn (1.1.1)
escape (0.0.4)
etc (default: 1.0.1)
executable-hooks (1.6.0)
fcntl (default: 1.0.0)
fiddle (default: 1.0.0)
fileutils (default: 1.1.0)
forwardable (default: 1.2.0)
fourflusher (2.2.0)
fuzzy_match (2.0.4)
gem-wrappers (1.4.0)
geocoder (1.5.1)
gh_inspector (1.1.3)
gherkin (5.1.0)
http-cookie (1.0.3)
httpclient (2.8.3)
i18n (0.9.5)
io-console (default: 0.4.7)
ipaddr (default: 1.2.2)
irb (default: 1.0.0)
json (2.2.0, default: 2.1.0, 1.8.6)
logger (default: 1.3.0)
luffa (2.1.0)
matrix (default: 0.1.0)
mime-types (2.99.3)
minitest (5.11.3)
molinillo (0.6.6)
multi_json (1.13.1)
multi_test (0.1.2)
mutex_m (default: 0.1.0)
nanaimo (0.2.6)
nap (1.1.0)
net-telnet (0.2.0)
netrc (0.11.0)
openssl (default: 2.1.2)
ostruct (default: 0.1.0)
power_assert (1.1.3)
prime (default: 0.1.0)
psych (default: 3.1.0)
public_suffix (3.0.3)
rake (12.3.2)
rdoc (default: 6.1.0)
rest-client (2.0.2, 1.6.7)
retriable (2.0.2)
rexml (default: 3.1.9)
rouge (2.0.7)
rss (default: 0.2.7)
ruby-macho (1.4.0)
rubygems-bundler (1.4.5)
rubyzip (1.2.2)
run_loop (4.2.2)
rvm (1.11.3.9)
scanf (default: 1.0.0)
sdbm (default: 1.0.0)
shell (default: 0.7)
slowhandcuke (0.0.3)
stringio (default: 0.0.2)
strscan (default: 1.0.0)
sync (default: 0.5.0)
syntax (1.2.2)
test-unit (3.2.9)
thor (0.20.3)
thread_safe (0.3.6)
thwait (default: 0.1.0)
tracer (default: 0.1.0)
tzinfo (1.2.5)
unf (0.1.4)
unf_ext (0.0.7.6)
unirest (1.1.2, 1.0.8)
webrick (default: 1.4.2)
xamarin-test-cloud (2.2.0)
xcodeproj (1.9.0, 1.8.2)
xcpretty (0.3.0)
xmlrpc (0.3.0)
zlib (default: 1.0.0)
luke-hill commented 5 years ago

Ping @Oopla / @sarzamas - Have either of you got a git repo we can pull this down and run it on? Are you still getting the issue if you try the 4.0.0.rc1 version of cucumber.

@sjakobi - You mentioned hacking on this last year, did you get anywhere with it?

lflucasferreira commented 5 years ago

I am facing the same issue on my framework. I realised that a specific method called within the step file is causing this trouble. When I copied the code to the login_step.rb file it worked fine. But I need to maintain my code refactored and this issue is not allowing me that.

The code that's causing this trouble is:

def close_iframe 
  page.within_frame('popAvisoCentrais') do
    click_on 'Fechar'
  end
end
lflucasferreira commented 5 years ago

Guys. I found the problem in my code. My features are written in Portuguese and the keyword "Then" is "Então". Because of this "ã" the problem happened. Maybe this can be useful to someone.

luke-hill commented 5 years ago

As mentioned before, if you can create a MVCE ideally in a small git repo we could pull down and run directly then that will be of greatest help.

Trying to piece together code in another language isn't something I'm willing to invest time into as it's likely I'll get it wrong.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs.

stale[bot] commented 5 years ago

This issue has been automatically closed because of inactivity. You can support the Cucumber core team on opencollective.

sarzamas commented 5 years ago

Ping @luke-hill I found the way how to reproduce error on any of your projects ....basically its written in bug title... due to special character in step definition

These are necessary steps to be done for reproduction:

  1. particular step description in <.feature> file and in <_steps.rb> file should be written in non-latin characters.

  2. this step should encounter error of class Calabash::Android::WaitHelpers::WaitError (this happens every time when trait method is returning error)

  3. launch cucumber with flags: -f html -o report.html

  4. check error in report.html

    
    invalid byte sequence in UTF-8 (ArgumentError)
    -1 # Couldn't get snippet for 
5. replace all non-latin characters in step with latin characters - and no problem occurs

So you can trigger error easily on your environment

for example:
1) - step in Error.feature file should look like the following:

Feature: Error with UTF-8. Scenario: Generate error with UTF-8

2) - step in Error_steps.rb file should look like the following:

Given ("Я вижу ошибку UTF-8") do
  @page = page(SomePage).await(timeout:10)
end

this instuction will trigger trait method described in SomePage.rb file

3) - SomePage.rb file should look like the following:

class SomePage < Calabash::ABase
  def trait
       "* id:'any'"
  end
end

4) then you start test as you do it regularly > bundle exec cucumber -p android -f pretty --expand -f html -o report.html

5) you check the console output and report.html file

invalid byte sequence in UTF-8 (ArgumentError)
-1 # Couldn't get snippet for 

6) replacing text 'Я вижу ошибку UTF-8' into 'I see error UTF-8' and backwards will trigger reproduction of error

Could you comment on what and where I should trace to figure out whether I am wrong or its a bug in thml formatter?

P.S. if I don't use option -f html - all is fine with the same steps in Russian

P.P.S what makes me upset is that this error happens before After do |scenario| in app_life_cycle_hooks.rb and prevents from its execution...

thanks

sarzamas commented 5 years ago

Well - its definitely a bug in html formatter! In continuation of my investigation I suddenly found the workaround:

The magic idea is to avoid making a screenshot of non-latin characters in code snippet!!!

as you may see a screenshot of code snippet is done -2 +2 lines of code prior and after the line of code causing a runtime error in cucumber - total 5 lines of suspicious code

image

so I get the error invalid byte sequence in UTF-8 (ArgumentError) if Russian characters are to be present in this snippet screenshot (are to be visible in any of these 5 lines)

once I have added some dummy placeholder lines to avoid line with russian letters to appear in the screenshot - suddenly UTF-8 error disappeared (this happened accidentally but it worked!!!)

So in my example above you will get the UTF-8 error if you have step in Error_steps.rb file looks like the following:

Given ("Я вижу ошибку UTF-8") do
  @page = page(SomePage).await(timeout:10)
end

image

But if you add two spare lines in between - there will not be error occurrence

Given ("Я вижу ошибку UTF-8") do

  @page = page(SomePage).await(timeout:10)
end

You see the difference!!! - just two spare lines after line with Russian text

Wow - that magic trick helps me continue to write steps in Russian and still receive html report in a more or less proper way! and after-scenario hooks implemented!

Never the less its a bug!!! cause now my steps definitions are full of spare lines after code with non-latin characters :((

luke-hill commented 5 years ago

I'll re-open if you can quickly provide a reproducible scenario. As has been mentioned several times we're not going to piece together comments. There is way too much confusion.

For the avoidance of doubt this is what I would expect.

1) A repo I can clone 2) a single command to be ran cucumber or rspec for instance. 3) An observable failure

Also bear in mind that your PC's own config may affect this.

Now to put some perspective on things, the html formatter in ruby is essentially in maintenance mode, and only small bug fixes are going to come in, because we're rewriting (or looking to), all the formatters using protocol buffers. There is a reasonable amount of info on github about this, so if you want to continue looking into this, just know what the current "status-quo" is.

sarzamas commented 5 years ago

HI, @luke-hill Thanks for your attention and beg pardon for the delay with my answer

I managed to make bug reproducible within small cucumber ruby project on GitHub instructions to reproduce and workaround patch to steps_definitions and my environment are in readme.md Check it for details here: UTF8_issue_in_cucumber_report.html

Hope this is possible to be fixed soon

Thanх, sarzamas

luke-hill commented 5 years ago

Having a quick look over the next couple of days.

Whilst I do, can you try and diet down that repo as much as possible. The fewer LOC there is, the easier it is for someone to narrow down the failure.

Some of those page objects have 50-60 lines (Do they need them?) There are 5 pages? Surely we only need 1?

Ideally there should be between 20 and 40 lines of ruby code in total. There is close to 1000 at the moment, can you try remove absolutely everything that isn't needed.

sarzamas commented 5 years ago

@luke-hill Thanks for fast reply I made source code as tiny as possible now - no more big files

lflucasferreira commented 4 years ago

I deleted the .pry_history file from C:/Users/user/.local/share/pry/

luke-hill commented 4 years ago

To anyone watching. This I said I'd triage but I haven't yet. Is there a git repo I can pull down to look at this

(Looking up, @sarzamas you mentioned you have one), could you re-link me to it.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs.

stale[bot] commented 4 years ago

This issue has been automatically closed because of inactivity. You can support the Cucumber core team on opencollective.