Alir3z4 / html2text

Convert HTML to Markdown-formatted text.
alir3z4.github.io/html2text/
GNU General Public License v3.0
1.74k stars 266 forks source link

RE_MD_DASH_MATCHER does not exist in the HTML2TEXT() object #382

Open sowinski opened 2 years ago

sowinski commented 2 years ago

According to the docs https://github.com/Alir3z4/html2text/blob/master/docs/usage.md#available-options There is a RE_MD_DASH_MATCHER option.

I can not see this option.

text_maker = html2text.HTML2Text()
print(dir(text_maker))
['CDATA_CONTENT_ELEMENTS', '_HTMLParser__starttag_text', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_decl_otherchars', '_parse_doctype_attlist', '_parse_doctype_element', '_parse_doctype_entity', '_parse_doctype_notation', '_parse_doctype_subset', '_scan_name', 'a', 'abbr_data', 'abbr_list', 'abbr_title', 'absolute_url_matcher', 'acount', 'astack', 'baseurl', 'blockquote', 'body_width', 'br_toggle', 'bypass_tables', 'cdata_elem', 'charref', 'check_for_whole_start_tag', 'clear_cdata_mode', 'close', 'close_quote', 'code', 'convert_charrefs', 'current_tag', 'default_image_alt', 'drop_white_space', 'emphasis', 'emphasis_mark', 'empty_link', 'entityref', 'error', 'escape_snob', 'feed', 'finish', 'get_starttag_text', 'getpos', 'goahead', 'google_doc', 'google_list_indent', 'google_nest_count', 'handle', 'handle_charref', 'handle_comment', 'handle_data', 'handle_decl', 'handle_emphasis', 'handle_endtag', 'handle_entityref', 'handle_pi', 'handle_startendtag', 'handle_starttag', 'handle_tag', 'hide_strikethrough', 'ignore_emphasis', 'ignore_images', 'ignore_links', 'ignore_tables', 'images_as_html', 'images_to_alt', 'images_with_size', 'inheader', 'inline_links', 'interesting', 'lastWasList', 'lastWasNL', 'lasttag', 'lineno', 'links_each_paragraph', 'list', 'mark_code', 'maybe_automatic_link', 'o', 'offset', 'open_quote', 'optwrap', 'out', 'outcount', 'outtextf', 'outtextlist', 'p', 'p_p', 'pad_tables', 'parse_bogus_comment', 'parse_comment', 'parse_declaration', 'parse_endtag', 'parse_html_declaration', 'parse_marked_section', 'parse_pi', 'parse_starttag', 'pbr', 'pre', 'preceding_data', 'preceding_stressed', 'previousIndex', 'protect_links', 'quiet', 'quote', 'rawdata', 'reset', 'set_cdata_mode', 'single_line_break', 'skip_internal_links', 'soft_br', 'space', 'split_next_td', 'start', 'startpre', 'stressed', 'strong_mark', 'style', 'style_def', 'table_start', 'tag_callback', 'tag_stack', 'td_count', 'ul_item_mark', 'unescape', 'unicode_snob', 'unknown_decl', 'updatepos', 'use_automatic_links', 'wrap_links', 'wrap_list_items']

Do I miss something? Or is this a bug