cucapra / pollen

generating hardware accelerators for pangenomic graph queries
MIT License
24 stars 1 forks source link

Parser #38

Closed susan-garry closed 1 year ago

susan-garry commented 1 year ago

The beginning of a bona-fide parser for pollen! This lays out some basic infrastructure and implements parsing for a few basic features.

What it can do:

What it can't do/could do better:

sampsyo commented 1 year ago

Wahoo!!!! Super nice work getting the parser going, @susan-garry! And figuring out the trickiness with Pest along the way!

There is one small clerical set of things we'll want to clean up before merging: some build targets got added to git here. Let's remove these things and add some of them to a .gitignore somewhere:

And here are some thoughts about strategy for the future, all of which deserve to happen in subsequent PRs instead of this one:

susan-garry commented 1 year ago

Ah, whoops! I have updates the .gitignore file and removed the extraneous files - the 1400 modified files probably should have tipped me off that something was amiss.

About our strategy for the future:

sampsyo commented 1 year ago

python uses type hints in function definitions but not for variable declarations?

It also has type hints on variable declarations! So this is in fact valid Python:

i: int = 1

My general take on this is: pick any keyword you want (let, var, decl, def, const, local, nothing), but the type-after-the-identifier style is the way of the future. C & Java are old and do it the old way; Rust, Go, Swift, TypeScript, typed Python, Scala, and Kotlin are all new and learned from the mistakes of the past and do it the new way. A big advantage is that it lends itself well to adding type inference in the future, i.e., eventually supporting let x = 5 instead of let x: int = 5 if the compiler can deduce that for you without either a disruptive change to the syntax or an annoying non-type type name like C++ auto.

I think function calls, record declarations, record field access, and emit statements

Certainly emit statements! That's important for producing any output. Makes sense.

Not sure about the others though: when you get a chance, maybe you could elaborate somewhere on how function and records arise? I can imagine functions not mattering much for depth, for instance.

susan-garry commented 1 year ago

That makes a lot of sense to me - I will go with i: int = 1; unless someone has strong feelings about using a keyword like let.

Record access definitely needs to be supported because we are representing pangenomic graphs essentially as a bunch of records - if we want to compute node depth, we need to call node.steps. I cannot, off the top of my head, think of a meaningful graph query that doesn't involve this.

Aside from this, we have two basic types of output - mappings of nodes to data, and modified graphs. We will probably wants to support at least one of these in the basic iteration of pollen. To support mappings of nodes to data, we probably need to support either arrays, tuples, or record definition and initialization (though not all three). To support the outputting of new graphs, we need record initialization (though not record definition), since we represent graphs using records.

Since record field access is essential either way, and we can support both types of output through records, that seems like the most logical choice for what to support beyond the emit statement.

EclecticGriffin commented 1 year ago

Ah, whoops! I have updates the .gitignore file and removed the extraneous files - the 1400 modified files probably should have tipped me off that something was amiss.

Random note that gitignore.io can be helpful for initial setup esp with regard to temp files.

Also might be worth noting that git exclude also exists which is basically just gitignore but only local (not committed). Useful if there are some files you want ignored that are specific to your dev flow. For example I have a local_examples folder where I put calyx files I'm trying to debug or mess with and having it in the git exclude saves me from having to side step those files when adding things to a commit.

sampsyo commented 1 year ago

Cool cool; the argument for "dot syntax" to get things like node.steps makes sense to me!!

susan-garry commented 1 year ago

I've verified that make test-slow-odgi and make test-slow-flip work as expected, so I will go ahead and merge what we have so far!