TheEvergreenStateCollege / smarty-plants

Plant genome sequencing
1 stars 0 forks source link

Consider ways to model edge string fragments in database schema #28

Open learner-long-life opened 2 weeks ago

learner-long-life commented 2 weeks ago

In modeling this input string and suffix tree with the visualizer

image

We notice that edge strings have repeating fragments, and it may be easier to extend the end fragment with new characters (bc in the example above for all the leaves)

One possible way to

model Node {
  id          Int    @id @default(autoincrement())
  parentId    Int
  childId     Int
  stringStart Int 
  stringEnd   Int
}

model InputString {
  id          Int    @id @default(autoincrement())
  inputString String
}

model Node {
  id          Int    @id @default(autoincrement())
  parentId    Int
  childId     Int
  edgeString EdgeStringFragment[]
}

model NodeToEdgeStringFragments {

}

model EdgeStringFragment {
  id          Int    @id @default(autoincrement())
  stringFragment String
}

model InputString {
  id          Int    @id @default(autoincrement())
  inputString String
}

Or store input start and end indices

model Node {
  id          Int    @id @default(autoincrement())
  parentId    Int
  childId     Int
  edgeStringStart Int
  edgeStringEnd Int
}
learner-long-life commented 2 weeks ago

Using this visualizer

http://brenden.github.io/ukkonen-animation/