andre-simon / highlight

Source code to formatted text converter
GNU General Public License v3.0
206 stars 58 forks source link

Language Definition 'Environments' or 'States' #47

Closed ababaian closed 6 years ago

ababaian commented 6 years ago

Heya, man do I wish I found this repo a few months ago before I started the long battle of using source-highlight. I'm developing language syntax highlighting for biological data and one of the main features is adding syntax highlighting for data in a pager, like less.

I've written new language definitions files but I think I could port those over without too much problems. Highlight looks great. I just have a few questions about the highlight language definition files which I couldn't find a clear answer for.

1) Is it possible to define something equivalent to source-highlight's environment or state in which new regex definitions can be defined within that state? And is it possible to nest these states?

I've essentially used such nesting to select columns of data by chaining on tabs. The GTF language file is probably the cleanest example of this.

2) Is there a less pipe script or something similar to alias less to highlight | less? Something like source-highlight's lesspipe.sh

3) For most High Performance Clusters, most users don't have sudo access. Is there a way to install highlight locally for users?

tajmone commented 6 years ago

Impressive work you've done!

  1. Is it possible to define something equivalent to source-highlight's environment or state in which new regex definitions can be defined within that state?

I don't know how source-highlight works, but Highlight syntax defiinitions are in Lua, so you can add custom Lua code by hooking into OnStateChange():

... and there are also some variables representing the internal highlighting states:

And is it possible to nest these states?

In the final output (eg, html) tags can't be nested, if that is what you are referring to. As for internal parsing states, you can create custom vars to symulate nested states on which you might base parser decisions.

You might also consider hooking into Decorate() function, to manipulate tokens' output:

Furthermore, plugins could be used to handle more complex cases. What I love about Highlight is that it allows you using Lua in langDefs, themes and plugins, which opens up to considrable potentials.

andre-simon commented 6 years ago

Hi,

Is it possible to define something equivalent to source-highlight's environment or state in which new regex definitions can be defined within that state? And is it possible to nest these states?

it is possible to mimic environment using regular expressions and the hook functions mentioned above. But highlight has no idea of a position within a line of text, unless it can be determined with a pattern.

Is there a less pipe script or something similar to alias less to highlight | less? Something like source-highlight's lesspipe.sh

See this gist: https://gist.github.com/textarcana/4611277#gistcomment-1701305 I finally should add this in the docs ;)

For most High Performance Clusters, most users don't have sudo access. Is there a way to install highlight locally for users?

See https://www.saicharan.in/blog/2011/08/14/gitweb-setup-without-root-access/, you can set the destination diretory in the makefile.

I put together a small demo for SAM files (without any knowledge about the correct grammar, just extracting visual patterns):

The syntax file:

Description="SAM"

Keywords={

  { Id=1,
    Regex=[[(SRR|SNES)\d+\.\d+]], Group=0
  },
  { Id=2,
    Regex=[[chr\d+]]
  },
  { Id=3,
    Regex=[[\d+[DMS][DM\d]*]]
  },

  { Id=2,
    Regex=[[ [ASXNMDOGYTR]{2}\: ]]
  },

  { Id=5,
    Regex=[[ [ATCG]{64,} ]]
  },

  { Id=4,
    Regex=[[ [\S]{100,} ]]
  }
}

Comments = {
   {
     Block=false,
     Delimiter = { [[^@]] }
   }
}

IgnoreCase=false

Operators=[[\(|\)|\[|\]|\{|\}|\,|\;|\.|\:|\&|<|>|\!|=|\/|\*|\%|\+|\-|\~|\||\^|\?]]

And a plugin to color the "DNS block":


Description="Generate SAM colored DNA? block"

-- optional parameter: syntax description
function syntaxUpdate(desc)

  function Decorate(token, state)
      --and token:match("%W")

    if ( (HL_OUTPUT == HL_FORMAT_HTML or HL_OUTPUT == HL_FORMAT_XHTML) 
        and #token > 63 and state == HL_KEYWORD and not string.match(token,"[^ATCG]") ) then

      retVal = ""
      for c in token:gmatch"." do
        retVal = retVal .. "<span class='elem_".. c .. "'>".. c .. "</span>"
      end
      return retVal
    end
  end
end

function themeUpdate(desc)
  if (HL_OUTPUT == HL_FORMAT_HTML or HL_OUTPUT == HL_FORMAT_XHTML) then

    Injections[#Injections+1]=[[
span.elem_A {
  background-color: lightblue;
}
span.elem_T {
  background-color: lightyellow;
}
span.elem_C {
  background-color: lightgreen;
}
span.elem_G {
  background-color: lightpink;
}
 ]]
  end
end

Plugins={

  { Type="lang", Chunk=syntaxUpdate },
  { Type="theme", Chunk=themeUpdate }
}

highlight --config-file sam.lang mario.sam -I --plug-in sam_colseq.lua > mario.sam.html produces mario.sam.html.gz

This might help you to decide if highlight suits your needs.

ababaian commented 6 years ago

Thanks a lot! I think the logistical set-up of highlight is much better. I'll start playing with it to see if I can get this working and if it's implemented fast enough for big-data files.

tajmone commented 6 years ago

@ababaian : I've just added to my Highlight Wiki a page dedicated to creating LangDefs:

https://github.com/tajmone/highlight/wiki/LanDefs

It might come handy if you're working on new langDefs. I'll be updating the page (and Wiki) frequently in the coming few days.