Python 3.5 support #98

Closed 9 years ago

Alir3z4 commented 9 years ago

The first attempt on running the test on python 3.5 version has failed.

No hurries for Python 3.5 version for now, but the failures are cool and weird.

Using worker:

FAIL: test_emdash-para_cmd (test.test_html2text.TestHTML2Text)
Traceback (most recent call last):
  File "/home/travis/build/Alir3z4/html2text/test/", line 106, in test_cmd
    self.assertEqual(result, actual)
AssertionError: 'Baco[257 chars]shank--\n\n--irure ex esse id, ham commodo mea[476 chars]\n\n' != 'Baco[257 chars]shank—\n\n—irure ex esse id, ham commodo meatl[474 chars]\n\n'
  Bacon ipsum dolor sit amet pork chop id pork belly ham hock, sed meatloaf eu
  exercitation flank quis veniam officia. Chuck dolor esse, occaecat est elit
  drumstick ground round tri-tip nisi. Eu fugiat drumstick leberkas magna.
- Turducken frankfurter nisi aute shank--
?                                      ^^
+ Turducken frankfurter nisi aute shank—
?                                      ^

- --irure ex esse id, ham commodo meatloaf pig pariatur ut cow. Officia salami
? ^^
+ —irure ex esse id, ham commodo meatloaf pig pariatur ut cow. Officia salami in
? ^                                                                          +++
- in fatback voluptate boudin ullamco beef ribs shank. Duis spare ribs pork
? ---
+ fatback voluptate boudin ullamco beef ribs shank. Duis spare ribs pork chop,
?                                                                       ++++++
- chop, ad leberkas reprehenderit id voluptate salami ham ut in ut cillum
? ------
+ ad leberkas reprehenderit id voluptate salami ham ut in ut cillum turducken.
?                                                                  +++++++++++
- turducken. Nisi ribeye tail capicola dolore andouille. Short ribs id beef
? -----------
+ Nisi ribeye tail capicola dolore andouille. Short ribs id beef ribs, et nulla
?                                                               +++++++++++++++
- ribs, et nulla ground round do sunt dolore. Dolore nisi ullamco veniam sunt.
? ---------------
+ ground round do sunt dolore. Dolore nisi ullamco veniam sunt. Duis brisket
?                                                              +++++++++++++
- Duis brisket drumstick, dolor fatback filet mignon meatloaf laboris tri-tip
? -------------
+ drumstick, dolor fatback filet mignon meatloaf laboris tri-tip speck chuck
?                                                               ++++++++++++
- speck chuck ball tip voluptate ullamco laborum.
? ------------
+ ball tip voluptate ullamco laborum.


FAIL: test_emdash-para_mod (test.test_html2text.TestHTML2Text)
Traceback (most recent call last):
  File "/home/travis/build/Alir3z4/html2text/test/", line 99, in test_mod
    self.assertEqual(result, actual)
AssertionError: 'Baco[257 chars]shank--\n\n--irure ex esse id, ham commodo mea[476 chars]\n\n' != 'Baco[257 chars]shank—\n\n—irure ex esse id, ham commodo meatl[474 chars]\n\n'
  Bacon ipsum dolor sit amet pork chop id pork belly ham hock, sed meatloaf eu
  exercitation flank quis veniam officia. Chuck dolor esse, occaecat est elit
  drumstick ground round tri-tip nisi. Eu fugiat drumstick leberkas magna.
- Turducken frankfurter nisi aute shank--
?                                      ^^
+ Turducken frankfurter nisi aute shank—
?                                      ^

- --irure ex esse id, ham commodo meatloaf pig pariatur ut cow. Officia salami
? ^^
+ —irure ex esse id, ham commodo meatloaf pig pariatur ut cow. Officia salami in
? ^                                                                          +++
- in fatback voluptate boudin ullamco beef ribs shank. Duis spare ribs pork
? ---
+ fatback voluptate boudin ullamco beef ribs shank. Duis spare ribs pork chop,
?                                                                       ++++++
- chop, ad leberkas reprehenderit id voluptate salami ham ut in ut cillum
? ------
+ ad leberkas reprehenderit id voluptate salami ham ut in ut cillum turducken.
?                                                                  +++++++++++
- turducken. Nisi ribeye tail capicola dolore andouille. Short ribs id beef
? -----------
+ Nisi ribeye tail capicola dolore andouille. Short ribs id beef ribs, et nulla
?                                                               +++++++++++++++
- ribs, et nulla ground round do sunt dolore. Dolore nisi ullamco veniam sunt.
? ---------------
+ ground round do sunt dolore. Dolore nisi ullamco veniam sunt. Duis brisket
?                                                              +++++++++++++
- Duis brisket drumstick, dolor fatback filet mignon meatloaf laboris tri-tip
? -------------
+ drumstick, dolor fatback filet mignon meatloaf laboris tri-tip speck chuck
?                                                               ++++++++++++
- speck chuck ball tip voluptate ullamco laborum.
? ------------
+ ball tip voluptate ullamco laborum.


FAIL: test_googledocmassdownload_cmd (test.test_html2text.TestHTML2Text)
Traceback (most recent call last):
  File "/home/travis/build/Alir3z4/html2text/test/", line 106, in test_cmd
    self.assertEqual(result, actual)
AssertionError: "#  t[237 chars]n**   being\n  3. end  \n  \n**bold**   \n_ita[156 chars]_ \n" != "#  t[237 chars]n**  being\n  3. end  \n  \n**bold**   \n_ital[146 chars]_ \n"
  #  test doc  

  first issue  

    - bit
    - _**bold italic**_ 
      - orange
      - apple
    - final  

  text to separate lists  

    1. now with numbers
    2. the prisoner
      1. not an  _italic number_ 
-     2. a  **bold human**   being
?                           -
+     2. a  **bold human**  being
    3. end  


  ` def func(x):`  
- `   if x < 1:`  
?  --
+ ` if x < 1:`  
- `     return 'a'`  
?  ----
+ ` return 'a'`  
- `   return 'b'`  
?  --
+ ` return 'b'`  

- Some  ` fixed width text`  here  
?                           -
+ Some  ` fixed width text` here  
  _` italic fixed width text`_ 

FAIL: test_googledocmassdownload_mod (test.test_html2text.TestHTML2Text)
Traceback (most recent call last):
  File "/home/travis/build/Alir3z4/html2text/test/", line 99, in test_mod
    self.assertEqual(result, actual)
AssertionError: "#  t[237 chars]n**   being\n  3. end  \n  \n**bold**   \n_ita[156 chars]_ \n" != "#  t[237 chars]n**  being\n  3. end  \n  \n**bold**   \n_ital[146 chars]_ \n"
  #  test doc  

  first issue  

    - bit
    - _**bold italic**_ 
      - orange
      - apple
    - final  

  text to separate lists  

    1. now with numbers
    2. the prisoner
      1. not an  _italic number_ 
-     2. a  **bold human**   being
?                           -
+     2. a  **bold human**  being
    3. end  


  ` def func(x):`  
- `   if x < 1:`  
?  --
+ ` if x < 1:`  
- `     return 'a'`  
?  ----
+ ` return 'a'`  
- `   return 'b'`  
?  --
+ ` return 'b'`  

- Some  ` fixed width text`  here  
?                           -
+ Some  ` fixed width text` here  
  _` italic fixed width text`_ 

FAIL: test_googledocsaved_cmd (test.test_html2text.TestHTML2Text)
Traceback (most recent call last):
  File "/home/travis/build/Alir3z4/html2text/test/", line 106, in test_cmd
    self.assertEqual(result, actual)
AssertionError: "#  t[237 chars]n**   being\n  3. end  \n  \n**bold**   \n_ita[156 chars]_ \n" != "#  t[237 chars]n**  being\n  3. end  \n  \n**bold**   \n_ital[146 chars]_ \n"
  #  test doc  

  first issue  

    - bit
    - _**bold italic**_ 
      - orange
      - apple
    - final  

  text to separate lists  

    1. now with numbers
    2. the prisoner
      1. not an  _italic number_ 
-     2. a  **bold human**   being
?                           -
+     2. a  **bold human**  being
    3. end  


  ` def func(x):`  
- `   if x < 1:`  
?  --
+ ` if x < 1:`  
- `     return 'a'`  
?  ----
+ ` return 'a'`  
- `   return 'b'`  
?  --
+ ` return 'b'`  

- Some  ` fixed width text`  here  
?                           -
+ Some  ` fixed width text` here  
  _` italic fixed width text`_ 

FAIL: test_googledocsaved_mod (test.test_html2text.TestHTML2Text)
Traceback (most recent call last):
  File "/home/travis/build/Alir3z4/html2text/test/", line 99, in test_mod
    self.assertEqual(result, actual)
AssertionError: "#  t[237 chars]n**   being\n  3. end  \n  \n**bold**   \n_ita[156 chars]_ \n" != "#  t[237 chars]n**  being\n  3. end  \n  \n**bold**   \n_ital[146 chars]_ \n"
  #  test doc  

  first issue  

    - bit
    - _**bold italic**_ 
      - orange
      - apple
    - final  

  text to separate lists  

    1. now with numbers
    2. the prisoner
      1. not an  _italic number_ 
-     2. a  **bold human**   being
?                           -
+     2. a  **bold human**  being
    3. end  


  ` def func(x):`  
- `   if x < 1:`  
?  --
+ ` if x < 1:`  
- `     return 'a'`  
?  ----
+ ` return 'a'`  
- `   return 'b'`  
?  --
+ ` return 'b'`  

- Some  ` fixed width text`  here  
?                           -
+ Some  ` fixed width text` here  
  _` italic fixed width text`_ 

FAIL: test_html-escaping_cmd (test.test_html2text.TestHTML2Text)
Traceback (most recent call last):
  File "/home/travis/build/Alir3z4/html2text/test/", line 106, in test_cmd
    self.assertEqual(result, actual)
AssertionError: 'Escaped HTML like &lt;div&gt; or &amp; should remain escape[100 chars]\n\n' != 'Escaped HTML like <div> or & should remain escaped on outpu[90 chars]\n\n'
- Escaped HTML like &lt;div&gt; or &amp; should remain escaped on output
?                   ^^^^   ^^^^     ----
+ Escaped HTML like <div> or & should remain escaped on output
?                   ^   ^

      ...unless that escaped HTML is in a <pre> tag

  `...or a <code> tag`

FAIL: test_html-escaping_mod (test.test_html2text.TestHTML2Text)
Traceback (most recent call last):
  File "/home/travis/build/Alir3z4/html2text/test/", line 99, in test_mod
    self.assertEqual(result, actual)
AssertionError: 'Escaped HTML like &lt;div&gt; or &amp; should remain escape[100 chars]\n\n' != 'Escaped HTML like <div> or & should remain escaped on outpu[90 chars]\n\n'
- Escaped HTML like &lt;div&gt; or &amp; should remain escaped on output
?                   ^^^^   ^^^^     ----
+ Escaped HTML like <div> or & should remain escaped on output
?                   ^   ^

      ...unless that escaped HTML is in a <pre> tag

  `...or a <code> tag`

FAIL: test_html_entities_out_of_text_cmd (test.test_html2text.TestHTML2Text)
Traceback (most recent call last):
  File "/home/travis/build/Alir3z4/html2text/test/", line 106, in test_cmd
    self.assertEqual(result, actual)
AssertionError: '[allas: Country Manager](http://thth)\n\n' != '[állás: Country Manager](http://thth)\n\n'
- [allas: Country Manager](http://thth)
?  ^  ^
+ [állás: Country Manager](http://thth)
?  ^  ^

FAIL: test_html_entities_out_of_text_mod (test.test_html2text.TestHTML2Text)
Traceback (most recent call last):
  File "/home/travis/build/Alir3z4/html2text/test/", line 99, in test_mod
    self.assertEqual(result, actual)
AssertionError: '[allas: Country Manager](http://thth)\n\n' != '[állás: Country Manager](http://thth)\n\n'
- [allas: Country Manager](http://thth)
?  ^  ^
+ [állás: Country Manager](http://thth)
?  ^  ^

FAIL: test_invalid_unicode_mod (test.test_html2text.TestHTML2Text)
Traceback (most recent call last):
  File "/home/travis/build/Alir3z4/html2text/test/", line 99, in test_mod
    self.assertEqual(result, actual)
AssertionError: 'Br\n\n' != 'B�r\n\n'
- Br
+ B�r
?  +

FAIL: test_nbsp_unicode_mod (test.test_html2text.TestHTML2Text)
Traceback (most recent call last):
  File "/home/travis/build/Alir3z4/html2text/test/", line 99, in test_mod
    self.assertEqual(result, actual)
AssertionError: '# NB[182 chars]ed do\xa0eiusmod\ntempor incididunt ut\xa0labo[385 chars]\n\n' != '# NB[182 chars]ed do eiusmod\ntempor incididunt ut labore et [349 chars]\n\n'
  # NBSP handling test #2

  In this test all NBSPs will be replaced with unicode non-breaking spaces
  (unicode_snob = True).

- Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
?                                                                 ^
+ Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
?                                                                 ^
- tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
?                     ^         ^                       ^       ^
+ tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
?                     ^         ^                       ^       ^
- quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
?                                                  ^          ^
+ quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
?                                                  ^          ^

- Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore
?                         ^                ^
+ Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore
?                         ^                ^
- eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt
?   ^
+ eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt
?   ^
- in culpa qui officia deserunt mollit anim id est laborum.
?   ^                                         ^
+ in culpa qui officia deserunt mollit anim id est laborum.
?   ^                                         ^

Ran 88 tests in 11.903s

FAILED (failures=12)

The command "PYTHONPATH=$PYTHONPATH:. coverage run --source=html2text --rcfile=.coveragerc test -v" exited with 1.
Name                 Stmts   Miss Branch BrMiss  Cover
html2text/__init__     573     45    337     27    92%
html2text/cli           72      9      6      1    87%
html2text/compat        10      4      2      1    58%
html2text/config        33      0      0      0   100%
html2text/utils        103      4     54      2    96%
TOTAL                  791     62    399     31    92%

theSage21 commented 9 years ago

@Alir3z4 These failing tests can be broken down into:

  1. spaces and "\xa0" are interchanged.
    • nbsp_unicode_md has "\xa0"
  2. invalid_unicode_md allows invalid md to go through
  3. html_entities_out_of_text does not convert állás to allas
  4. html-escaping does not work for <, > and &
  5. Googledoc_saved, google_doc_mass_download
    • Extra space after \ in googledoc_saved
    • extra spaces after `
  6. emdash-para
    • one less - at the end
    • strange wrapping

The first issue is easily handled. It is the character \xa0 instead of a normal space.

theSage21 commented 9 years ago

@Alir3z4 Just noticed that this is a duplicate of It is funny that this is and that is Can you close this one please? I will continue working on that since that was reported earlier.

Alir3z4 commented 9 years ago

@theSage21 Good catch, it's interesting. It's a conspiracy theory, Illuminati alert :D

The issue is closed and marked as duplicate of #89