Closed ababaian closed 6 years ago
This is really cool.
I think it looks pretty robust as it is. Though, would it work for files with >5 columns? Might need to do some figuring out on how to encode the 5th column if there is a 6th column. Maybe something along the lines of '(?<=\t[\S]\t)[\S]\t'?
Even simpler version with an open-ended scope for all columns greater then 5.
%YAML 1.2
---
name: faidx
file_extensions: [fa.fai,fasta.fai]
scope: source.faidx
# Fasta Index Filetype Description
# NAME Name of this reference sequence
# LENGTH Total length of this reference sequence, in bases
# OFFSET Offset within the FASTA file of this sequence's first base
# LINEBASES The number of bases on each line
# LINEWIDTH The number of bytes in each line, including the newline
contexts:
main:
# COLUMN 1
- match: '^[\S]*\t'
scope: coord.Chr.faidx
push: col2
col2:
# COLUMN 2
- match: '[\S]*'
scope: coord.Start.faidx
- match: \t
push: col3
- match: $
pop: true
col3:
# COLUMN 3
- match: '[\S]*'
scope: constant.numeric.faidx
- match: \t
push: col4
- match: $
pop: true
col4:
# COLUMN 4
- match: '[\S]*'
scope: comment.line.faidx
- match: \t
push: col5
- match: $
pop: true
col5:
# COLUMN 5
- match: '[\S]*'
scope: comment.line.faidx
- match: \t
push: colast
- match: $
pop: true
colast:
# Any COLUMN >5
- match: .*
scope: comment.line.faidx
pop: true
brilliant!
I think this same logic could be applied for gedit and Vim syntax as well. There is a Match Start // Match End logic which can be extended in this way. I would say if we figure this out soon we'll simplify our lives greatly.
Maybe read some syntax highlighting files for other complex langauges (C / XML etc...) to learn how other people solved similar problems.
Can we get a screenshot of what it looks like @ababaian ?
Please can you give me the colors you used to do this colors scheme?
I'd say let's not worry 100% about all the color schemes just yet. This was based off of bioMonokai for Sublime which is dark background. Gedit is based off of Kate and is light background so it might not work. The third column is simply the default 'numeric' color, fourth + fifth are comment colored.
We're going to have to formalize all the colors and/or set one dark one light theme to make the same for all the different programs. We can worry about this last; now we need the syntax files to work reliably for all the different software as the highest priority.
Also faidx-gedit syntax
Check out Fasta Index Language File for an example of the logic. It's the same thing as in sublime / less where nested contexts can be used to select by column. This should make SAM/VCF/BED/GTF files much much easier to deal with.
I was working on the mostly trivial case of fasta-index format (faidx) and I think because it was so simple I found a very nice way to select columns by the order in which they appear. The only requirement right now is that it is in a tab-delimited file.
What it does is match the first column until the first tab, scopes it, then pushes to
contig.length
In
contig.length
every non-whitespace character is selected and scoped. Then when it hits the next tab it pops out.The third column is then selected, scoped and pushed to
genomic.offset
. The fourth column is selected and then popped at the tab.etc... This push-pop back and forth with tabs can be repeated for N number of columns which means that .bed, .bedpe, .gtf, .sam, and possibly some of .vcf can now be 'solved' since we know what type of data is supposed to be in the Nth column.
Can anyone think of a reason that this won't work or will break at some edge-case?
If not, we'll need to re-work those syntaxes as I think this is a more robust approach then trying to select each column by the data range which could be there.
faidx.sublime-syntax