getpelican / pelican-plugins

Collection of plugins for the Pelican static site generator
Other
1.38k stars 849 forks source link

AsciiDoc Reader with russian text #1277

Open gmaFFFFF opened 4 years ago

gmaFFFFF commented 4 years ago

Hi,

  1. In addition to asciidoc and asciidoctor, there is also the asciidoctorj tool. I suggest adding it as a default tool
  2. Faced problems while processing Russian texts | UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte. The solution requires several changes in the code and breaks fix. I got such a patch
  3. Russian metadata not supported
    diff --git a/asciidoc_reader/asciidoc_reader.py b/asciidoc_reader/asciidoc_reader.py
    index 881a857..789253d 100644
    --- a/asciidoc_reader/asciidoc_reader.py
    +++ b/asciidoc_reader/asciidoc_reader.py
    @@ -29,11 +29,12 @@ def fix_unicode(val):
         if sys.version_info < (3,0):
             val = unicode(val.decode("utf-8"))
         else:
    -        # This fixes an issue with character substitutions, e.g. '<F1>' to '<C3><B1>'.
    -        val = str.encode(val, "latin-1").decode("utf-8")
    +        # This fixes an issue with character substitutions, e.g. '<EF><BF><BD>' to '<C3><B1>'.
    +        # val = str.encode(val, "latin-1").decode("utf-8")
    +        ...
         return val

    -ALLOWED_CMDS = ["asciidoc", "asciidoctor"]
    +ALLOWED_CMDS = ["asciidoc", "asciidoctor", "asciidoctorj"]

     ENABLED = None != default()

    @@ -51,7 +52,7 @@ class AsciiDocReader(BaseReader):
             if cmd:
                 optlist = self.settings.get('ASCIIDOC_OPTIONS', []) + self.default_options
                 options = " ".join(optlist)
    -            content = call("%s %s -o - %s" % (cmd, options, source_path))
    +            #content = call("%s %s -o - %s" % (cmd, options, source_path))
                 # Beware! # Don't use tempfile.NamedTemporaryFile under Windows: https://bugs.python.org/issue14243
                 # Also, use mkstemp correctly (Linux and Windows): https://www.logilab.org/blogentry/17873
                 fd, temp_name = tempfile.mkstemp()
    @@ -74,7 +75,7 @@ class AsciiDocReader(BaseReader):
             """Parses the AsciiDoc file at the given `source_path` and returns found
             metadata."""
             metadata = {}
    -        with open(source_path) as fi:
    +        with open(source_path, encoding="utf-8") as fi:
                 prev = ""
                 for line in fi.readlines():
                     # Parse for doc title.
    @@ -88,7 +89,7 @@ class AsciiDocReader(BaseReader):
                             metadata['title'] = self.process_metadata('title', fix_unicode(title))

                     # Parse for other metadata.
    -                regexp = re.compile(r"^:[A-z]+:\s*[A-z0-9]")
    +                regexp = re.compile(r"^:[A-z]+:\s*\S")
                     if regexp.search(line):
                         toks = line.split(":", 2)
                         key = toks[1].strip().lower()
podsvirov commented 3 years ago

Hello @gmaFFFFF, looks like the 2 and 3 your suggestions done in #1310. Try to pull latest master and check it. Any feedback is appreciated.