amazon-ion / ion-schema

The Ion Schema Specification. This specification is licensed under the Apache 2.0 License.
https://amazon-ion.github.io/ion-schema/
Apache License 2.0
13 stars 10 forks source link

Provide a complete EBNF or ANTLR-like grammar #127

Open popematt opened 1 year ago

popematt commented 1 year ago

Hey! Esteemed ion-schema contributors, and especially @popematt, I have just started working on a new project that will use ion as encoding/decoding. And i need to write some codegen for different langs depend on .isl files. Would you mind sharing with me an ion-schema ebnf or smth like ANTLR4 full grammar? I saw https://amazon-ion.github.io/ion-schema/docs/isl-2-0/bnf-grammar, but it cannot be used due to obvious reasons. Basically, i need to write some kind of schema-lang file parser at first.

Originally posted by @net-yehor-tretiakov in https://github.com/amazon-ion/ion-schema/issues/82#issuecomment-1695438357

popematt commented 1 year ago

Hi @net-yehor-tretiakov, can you tell us a little more about your use case? What languages are you targeting for code generation? What features are you hoping to have in the generated code?

Is there something that you think is lacking from the existing parser/reader logic and data models in either ion-schema-kotlin or ion-schema-rust that is making you want a full grammar instead?

Also, as a FYI, we are working on ISL-based codegen ourselves, and if you're interested, it would be great if we could combine our efforts or at least learn from what each other is doing.

net-yehor-tretiakov commented 1 year ago

Sure! Ty for your reaction!

I would try to describe my problem:

Consider, that for some purpose we need to write a bunch of similar structures` encode and decode code using ion. As for start - with rust and js (btw i have faced some problems with incompatibilities, will create an issue later on). Basically, all the code code looks like this:

Code here is not an actual part of the project of mine, it is here just as example.

// Some ecosystem stuff
pub trait Encoder {
    fn encode(&self) -> Vec<u8>;
}
pub trait Decoder {
    fn decode(data: &[u8]) -> Self;
}

// Structure to encode/decode
#[derive(Debug, PartialEq, Eq)]
pub struct SomeStruct {
    some_data: Vec<u8>
}

/* Some structure impl code */

// Structure encode/decode
impl Encoder for SomeStruct {
    fn encode(&self) -> Vec<u8> {
        let buffer: Vec<u8> = Vec::new();

        let binary_writer_builder = ion_rs::BinaryWriterBuilder::new();
        let mut writer = binary_writer_builder.build(buffer).unwrap();

        writer.step_in(ion_rs::IonType::Struct).expect("Error while creating an ion struct");

        writer.set_field_name("some_data");
        writer.write_blob(&self.some_data).unwrap();

        writer.step_out().unwrap();
        writer.flush().unwrap();

        writer.output().deref().into()
    }
}
impl Decoder for DataPacketDTO {
    fn decode(data: &[u8]) -> Self {
        let mut binary_user_reader = ion_rs::ReaderBuilder::new().build(data).unwrap();
        binary_user_reader.next().unwrap();
        binary_user_reader.step_in().unwrap();

        binary_user_reader.next().unwrap();
        let binding = binary_user_reader.read_blob().unwrap();
        let some_data = binding.as_slice();

        SomeStruct {
            some_data
        }
    }
}

Previously I have been using Cap'n Proto, in rust with providing by them codegen, but decide to use ion instead. Now I face a problem, where i do not have an easy way to avoid code copy-pasting, as all the endec code is almost the same everywhere. Same thing with js. Just a bit different syntax. And i want to write a code generator based on official .isl format! So i need a way to parse the .isl files, so i can get all the _fields_forstructs / _iontypes to fill some patterns later on.

Mb I am a bit blind, but I can not find anything in public API in ion-schema-rust that can help me... I do can get type_refs, as ion_schema::types::TypeDefinition, but nothing more, or am I? I was not searching for it in ion-schema-kotlin repo tho...

net-yehor-tretiakov commented 1 year ago

Also, @popematt, it would be interesting for me to hear about your own codegen!

Would you mind sharing: What technologies are you using / What concepts will you follow (what abstraction level / how will you parse files / workflow in general) / What ideas were laid in this? Can you provide a template of what the result will be?

I am familiar with code generation using go/java only tho. It would be a cool experience to learn new ways of doing so!

popematt commented 1 year ago

I was messing around and created a very rough proof of concept a few months ago using Ion Schema Kotlin (see here and here).

We're still doing research for this, so don't take any of this as a guarantee. I can make no promises of when—or even if—this will come to fruition. Since we already have libraries that can read ISL, we would use those. Some transformation would be required to interpret the ISL into a more "conventional" (e.g. Java-like) representation of a type model, and then it would be handed off to a template engine or maybe a handwritten generator to produce the actual code. Ideally, the generated code would come with ready-to-use APIs for serialization and deserialization between Ion and the generated data model.

One thing is, I think, pretty safe to say—we have put a lot of work into some of our core libraries (like ion-java, ion-rust, etc.) and so anything we do for parsing ISL will almost certainly start with those libraries rather than starting from e.g. an ANTLR grammar.

popematt commented 4 months ago

FYI @net-yehor-tretiakov, we have been making progress towards adding code generation to the ion CLI. It is still early in the process, but here is a recently merged PR that includes some test cases and examples of the generated code. You are welcome to take a look. If you have any suggestions or you would like to contribute, please feel free to open an issue in amazon-ion/ion-cli.