MrLeebo / prisma-ast

Abstract Syntax Tree for parsing schema.prisma files
MIT License
139 stars 18 forks source link

[Feature] Tracking name changes #43

Open nonara opened 3 months ago

nonara commented 3 months ago

First, great library! Thanks for writing this.

Background

I used this to create something which transforms our prisma schema file. For all items which don't have a @@map tag already, we transform the name to camel-case, and we add a @@map property.

You can see a discussion of others talking about doing that exact thing here:

Issue

The one issue that I ran into here is that when I change names of declarations, the references to those declarations do not also change.

So, for example:

Old

model my_model {
  id String
}

model other {
  my_model_id String
  my_model my_model @relation(...)
}

New

model MyModel {
  id String
}

model Other {
  my_model_id String
  my_model my_model @relation(...) // This will throw a diagnostic due to "my_model" no longer existing
}

Workaround

Currently, I'm just walking all declarations after transform and finding fieldName and updating from the old to the new.

(Note: I'm not sure if that will fully cover refences, but I think so)

It would be great if this library could provide an API to change qualified names for declarations and have the references be updates as well!

MrLeebo commented 2 months ago

This sounds like an interesting feature, although I am worried it may be beyond the scope of what prisma-ast aims to accomplish. There are several downstream packages that use prisma-ast as a dependency to implement linters and other checks or operations on your prisma schema, and this functionality feels to me more in line with one of those tools.

I'd need to do some research on how references in prisma schemas work and how they can be expected to appear in the document. e.g. can an attribute contain a reference? Also, how the builder should be expected to respond if you tried to rename a model with a name that already exists, or with the name of a pre-existing type/keyword, e.g.

//                                                   \/ a type defined by prisma
const builderOrResult = builder.model('Foo').rename('Unsupported')
//    ^ would the return value be the builder, or an object describing the operation?

Is it acceptable to just raise an error, or should the builder return the status of the operation in its return value/in a stateful property, as an error might be potentially destructive to other changes that the builder is holding onto.

I wouldn't refuse a PR if you were to build it, but it probably won't be a priority for me to drive this feature forward.

nonara commented 2 months ago

Thanks for the response! 🎉

FWIW, I believe it's fairly common to track names with a symbol table, etc. in parsers, but I definitely understand if that goes beyond the original intention here!

You raised some great points. Since you're not validating or type-checking at all in this library and the purpose of the API would be to simply update the associated strings, I think I'd be inclined toward not throwing from either collisions or reserved keywords.

Beyond that, semantic soundness is generally checked later during parsing and not during node factory API. For example, I don't believe the TypeScript compiler API would complain at the API level if you created identifiers which were reserved, but you'd get diagnostics during later parsing. I believe the same is true of most AST APIs.

At any rate, I'd be happy to do a quick PR!

In thinking over approach, I think it could make sense to add a basic symbols map, which associates nodes with symbol, and during print we can use the symbol's value.

One question would be, how would you want to handle attempting to change the name property of declarations.

Here are a few options, off the top of my head:

  1. name uses getter / setter, which automatically updates the symbol when set (no rename method needed)
    • not technically breaking change, but probably a good idea to bump major version
  2. name is made readonly with JSDoc saying to use rename method
    • breaking change
  3. name behaviour remains as is, but add JSDoc suggestion to use rename method to rename the symbol
    • Note: could also add a console.warn

Personally, I'd be most in favour of 2, as I think it was unclear to me (and may be to others) that changing name of the declaration won't propagate to references.

However, I'm good with whatever you think is best!

MrLeebo commented 2 months ago

I would be inclined to say that there should not be a name property attempting to automatically rename references. That's uncharacteristically surprising from an AST that is otherwise, well, simple. I'm more comfortable with the user explicitly opting into the new behavior by invoking a rename() operation.

Changing the existing behavior is potentially a breaking change for existing scripts.

Plus, it's possible a user might have a script that is simultaneously renaming an object and also splitting some of the references onto another object (e.g. forking an enum)

nonara commented 2 months ago

I would be inclined to say that there should not be a name property attempting to automatically rename references.

Agreed. I don't like that route either.

Can you elaborate a bit on the examples of forking an enum, etc.?

Specifically, I'm wondering if you mean duplicating it by doing something like re-inserting the old references into the statements array (terminology may be incorrect there for this code, but hopefully it translates) in effort to create a copy.

Considering the AST is simple and doesn't always have position tracking info or metadata on whether it's a newly created node or one that was parsed, I'm assuming it might be possible to inject a reference to another node twice and have it print correctly, but changing the name would update both.

Thinking out loud, if anything like that is possible, I think a simple mitigation strategy would be to have a cleanup routine that is called: A) before print, B) before rename, that ensures all declaration nodes are unique objects, cloning where necessary.

However, I'm guessing you're not talking about a duplicated reference and mean more of using the API to create something new using old info. (Feel free to correct me if I'm wrong!)

So narrowing on concrete proposal...

Functionally, we would add rename method to appropriate declaration nodes, which would update both the declaration node's name property and the associated references.

I think instead of tracking in a map while parsing, since it is a simpler AST, it would make more sense to just identify and update references on-the-fly, when rename is called. That would allow for custom created nodes (those created after parsing) to be easily updated as well.

In other words, rename will simply iterate all potential references (right now assuming just type field) and change if the string matches.

What do you think?

MrLeebo commented 2 months ago

In other words, rename will simply iterate all potential references (right now assuming just type field) and change if the string matches.

What do you think?

I agree, that sounds perfect.


Can you elaborate a bit on the examples of forking an enum, etc.?

I don't have a real-world use case, so this example may be a bit contrived, but imagine something like:

model DeliveryRequest {
  product   Product
  recipient Recipient
  status    WorkflowStatus
}

model FabricationRequest {
  product Product
  status  WorkflowStatus
}

enum WorkflowStatus {
  PENDING
  IN_PROGRESS
  COMPLETE
}

And the schema owner needs to define unique workflow steps for the delivery status vs the fabrication status, and wants to write a script using prisma-ast to fork the enum into two enums with distinct statuses.

model DeliveryRequest {
  product   Product
  recipient Recipient
- status    WorkflowStatus
+ status    DeliveryStatus
}

model FabricationRequest {
  product Product
- status  WorkflowStatus
+ status  FabricationStatus
}

-enum WorkflowStatus {
-  PENDING
-  IN_PROGRESS
-  COMPLETE
-}

+enum DeliveryStatus {
+  PENDING
+  IN_PROGRESS
+  COMPLETE
+  LOST
+  DAMAGED
+}

+enum FabricationStatus {
+  PENDING
+  IN_PROGRESS
+  COMPLETE
+  BOTCHED
+  OUT_OF_MATERIALS
+}

So the author of a script made with prisma-ast may be making a modification to the schema where they are specifically breaking a reference that existed before.