chrivers / transwarp

Transwarp compiler - a python3 implementation of a Simple Type Format parser and renderer
GNU General Public License v3.0
2 stars 1 forks source link

Syntax variation "alpha" #2

Open chrivers opened 8 years ago

chrivers commented 8 years ago

(This is an attempt at finalizing the syntax. Going to pull this out of issue #1, so it's easier to follow)

I've made an attempt at cleaning up the syntax for the STF files. Take a look here:

https://github.com/chrivers/isolinear-chips/tree/syntax-varation-alpha

(the compiler doesn't handle it yet.. I don't want to spend time updating it, until we agree somewhat on the syntax)

Here are the changes:

  1. Namespaces are a thing now! Every field in an STF project can now be references with a unique name.
  2. File names are used now! Everything from "enums.stf" will be in the "enums::" namespace.
  3. Equals -> Colon. There's only colon now
  4. Any node in the tree can be references. To make the parser simpler, only absolute references are allow for now.
  5. The reference rules are simple: Any identifier that contains "::" is an absolute reference, and will be validated at compile-time. Everything else is a primitive type name.
  6. Types are also fairly simple. A type (the right-hand side of any section), can be either A) a literal value (ints, typically), B) an identifier (reference or not), or C) type with parameter (sizedarray<foo, 8>)
  7. Sections should be able to have named values associated with them. Here's a couple of ideas for syntax:
A:
object Creature(bitmask=3)
    ....
B:
!bitmask=3
object Creature
    ....
C:
object Creature
    !bitmask = 3

What do you think? This, or something else?

Here are some concrete changes for isolinear chips:

  1. "ClientPacket" and "ServerPacket" no longer exist - every packet type is now a struct.
  2. The enum<MainScreenView, u32> syntax is gone! Because we can make references, we now have the simpler form: enums::MainScreenView<u32>. "enums" in this context means the enums file. This can be checked for validity by the compiler.
  3. This change also helps the other types: ships: sizedarray<structs::Ship, 8>. This, of course, referes to the "structs" file.

In general, I think the code is quite a lot easier to read now. It should also be significantly easier to parse and run templates on, since there are many, many fewer special cases. Death to non-generic sections!

ping @NoseyNick @mrfishie - thoughts? ;-)

NoseyNick commented 8 years ago

Sorry, been crazy-busy since about the middle of last week, lucky I had time for a few hours of PLAYING Artemis on the weekend, never mind any protocol-nerdery :-) ... and going to continue to be quite busy until later this week too :-(

Overall, I like it, I feel like it's coming together, turning into something quite neat, and YES it is getting more readable. It occurs to me though:

"enum FrameType", this is "type, space, name". " front_shields: f32", this is "(whitespace*), name, colon, type". Opposite way 'round to the above? If this was a C-like language, it would be "float front_shields" I think? "object Anomaly(1)", this is "type, space, name, bracket, arg-thing, close-bracket". "parser SimpleEvent(read=u32)", this is "type, space, name, bracket, name EQUALS value". Is the whitespace syntactic, BTW, like python, or just for readability, like most other languages? " valueFloat: 0x0351a5ac" is "type, colon, VALUE" "CommsOutgoing", this is... just... "name"? It's a struct, right? "struct AllShipSettings" wasn't just "AllShipSettings" though?

"Everything from "enums.stf" will be in the "enums::" namespace" ... but "FrameType::shipSystemSync" isn't in FrameType.stf. It's from "enum FrameType". Which wasn't in enums.stf. Mmmmm?

"There's only colon now", well, and double-colons :-D

"only absolute references are allow for now", Did that include "FrameType::shipSystemSync" then? Or should it be "parser::FrameType::shipSystemSync"?

7... C? I think? I could probably be talked into any of them. Oh but hang on, you said "Equals -> Colon" :-p

1) I like that ClientPacket and ServerPacket are just structs. I'm still FAIRLY convinced that Objects and Parsers are just a special case of structs with (different rules for) optional bits, but also willing to keep them different if only 'cos we want them in different places in the html doc (?)

Come to think of it, I quite like the difference between your "enum" and "flags", and I'd argue an object is a struct where the bitfield attribute is being used as "flags" to pick 0-n optional attributes that follow, and a parser is a struct where the packet-type attribute is being used as an "enum" to pick 1 optional attribute (or SET of attributes, or sub-struct I guess, or something) that follows.

2) Still trying to get my head around the references syntax, sorry, late, brain fuzzy, hope to get back to you when I'm more awake. I wonder if / why there's still a mix of : vs =, not necessarily being consistent, but also () vs <> also not necessarily being consistent. I'm not saying the even NEED to be consistent, just I'd like to wrap my head around WHY they should OR shouldn't be.

I DO feel like the entire thing is falling FAIRLY neatly into one big compilable parsing-tree though. Definitely heading in a good direction.

G'night NN

chrivers commented 8 years ago

Just a quite note - thanks for the feedback! You're right there's a few things that aren't consistent.. I've been a bit busy too. Keep a look out for the updated version :)

chrivers commented 8 years ago

@NoseyNick

Thanks again for the feedback. Going over it, I certainly didn't make it easy for anyone to follow what I was saying :D

Hope you had time to play Artemis - let's hack some more protocol, when you have time :)

I'll try to address your points here:

"enum FrameType", this is "type, space, name". " front_shields: f32", this is "(whitespace*), name, colon, type". Opposite way 'round to the above? If this was a C-like language, it would be "float front_shields" I think?

True, but C-like type definitions are terrible, like kale worship and null pointers ;-)

"object Anomaly(1)", this is "type, space, name, bracket, arg-thing, close-bracket".

Yeah, that was a mistake. It uses the key=value syntax now:

object Anomaly(maskbytes=1)

I hope this makes sense?

btw, enum FrameType is exactly the same as enum FrameType(). That is, every section without key=value pairs just has the empty list as type arguments.

"parser SimpleEvent(read=u32)", this is "type, space, name, bracket, name EQUALS value".

Think of them as keyword arguments. Like in Python, Ruby, and a think a few other languages?

It will be parsed to approximately this:

("parser", "SimpleEvent", {"read": "u32"})

Is the whitespace syntactic, BTW, like python, or just for readability, like most other languages?

Syntactic! Always 4 spaces. Yes, I'll write a proper grammar once we agree on it :)

If we want optional whitespace, we need some more syntactic flour. I'm not a fan of braces, but that would be a very traditional way of delimiting sections.

" valueFloat: 0x0351a5ac" is "type, colon, VALUE"

Everything is a value of some sort of another. Right now, there are 3 possible value types:

That's it. This has been enough for everything so far. There are some obvious potential additions, like strings.

"CommsOutgoing", this is... just... "name"? It's a struct, right?

Oops, yeah. A lot of "struct"s missing. Fixed :)

"struct AllShipSettings" wasn't just "AllShipSettings" though?

Yeah, sorry for the confusion. Should be more consistent now :)

So here's the newest syntax and rules:

  1. A project is a collection of source files, each in the same namespace as the file name (minus .stf)

blocks:

  1. A source file is a list of blocks
  2. A block is a header, plus a list of fields
  3. A header consists of (typename, name, args). Args may be empty. typename and name are required.
  4. No special meaning is attached to any typename. It is up to the project to determine a policy for what blocks mean

identifiers:

  1. /[A-Za-z][A-Za-z0-9-]*/

block args:

  1. block args are a list of key=value pairs.
  2. the list can be empty
  3. if not empty, parantheses are mandatory (struct Foo(foo=bar))
  4. "name"s are identifiers
  5. "value"s follow standard value rules

fields:

  1. Each field must be indented 4 spaces from the block header
  2. Each field consists of either A) "name", , "value", or B) a new section start
  3. The "name" is either an identifier, or a reference

values:

  1. A "value" is either a constant, an identifier or a reference

reference:

  1. A reference is a named pointer to another piece of the code
  2. A reference can either refer to a field in another block, a block in another file, or a field in a block in another file
  3. File references use "." (dot). So in enums.stf, "structs.Details" would refer to "Details" in structs.stf
  4. Block references use "::" (double colon). So FrameType::simpleEvent refers to the simpleEvent field of the FrameType block. (maybe we should accept the potential name-clash ambiguity, and simplify "::" and "." into one of them?)

Okay, that's not quite a formal spec, but it should give an idea :)

I hope it's easier to follow now? Do you think it's more consistent?

mrfishie commented 8 years ago

Here's a few small points - I've been pretty busy but I'll try to do a proper look-through of everything that's changed at some point soon.

Right now, there are 3 possible value types:

  • identifier ("foo")
  • integer (42 or 0x2A)
  • reference (module.value, foo::bar)
  • blocks (struct Foo, enum Bar, etc)

Four value types? ;)

  1. File references use "." (dot). So in enums.stf, "structs.Details" would refer to "Details" in structs.stf
  2. Block references use "::" (double colon). So FrameType::simpleEvent refers to the simpleEvent field of the FrameType block. (maybe we should accept the potential name-clash ambiguity, and simplify "::" and "." into one of them?)

Perhaps we could allow files to be 'imported' into blocks, like this?

import structs as Structures   # or import "structs.stf" as Structures
# now we can do
Structures::Details

# or just
import structs
structs::Details

This would prevent potential name-clashes, however I don't really think that is a particularly big problem.

chrivers commented 8 years ago

Here's a few small points - I've been pretty busy but I'll try to do a proper look-through of everything that's changed at some point soon.

Right now, there are 3 possible value types:

identifier ("foo") integer (42 or 0x2A) reference (module.value, foo::bar) blocks (struct Foo, enum Bar, etc) Four value types? ;)

FOUR.. our FOUR chief ... I'll come back

http://cdn.hitfix.com/photos/6133668/monty-python-spanish-inquisition.jpg

:D

File references use "." (dot). So in enums.stf, "structs.Details" would refer to "Details" in structs.stf Block references use "::" (double colon). So FrameType::simpleEvent refers to the simpleEvent field of the FrameType block. (maybe we should accept the potential name-clash ambiguity, and simplify "::" and "." into one of them?) Perhaps we could allow files to be 'imported' into blocks, like this?

import structs as Structures   # or import "structs.stf" as Structures
# now we can do
Structures::Details

# or just
import structs
structs::Details

This would prevent potential name-clashes, however I don't really think that is a particularly big problem.

That's certainly a tried-and-true way from Python, so I'm not totally against that. I'm not sure the added complexity is worth it for version 1, however.

chrivers commented 8 years ago

This is embarassing, but there was a typo in the branch name :P

This is the new link: https://github.com/chrivers/isolinear-chips/tree/syntax-variation-alpha

chrivers commented 8 years ago

As a proof of concept, I implemented an XML parser in version 0.9.4 (just uploaded).

It's a bit rough around the edges though, as none of the syntax improvements we are discussing here are implemented, but it should make it easier to see the internal data structure that the templates use :)

The syntax is transwarp -D isolinear-chips/protocol -x xml if anyone is interested