CartwrightLab / dawg

Simulating Sequence Evolution
GNU General Public License v2.0
11 stars 3 forks source link

Migrate from trick file format to YAML #74

Open jgarciamesa opened 2 years ago

jgarciamesa commented 2 years ago

Proposed new input file format:

sim:
  reps: 2
  seed: [1, 2, 3]

# output parameters
output:
  block:
    head: text at beginning of output
  file: output_file.fasta
  split: true

# tree parameters
tree:
  tree_name1:
    tree: (A:0.3)D;
    scale: 0.2

# parts - define parts to be associated to region(s)
parts:
  part_name1:
    length: 10
    seq: ACGT
    code: 1
  part_name2:
    seq: CCGTC

# rules - parameters for substitution and indel models
rules:
  rule_name1:
    subst:
      model: GTR
      params: [2.0, 1.0, 3.0, 1.0, 1.0, 1.0]
      freqs: [0.2, 0.3, 0.3, 0.2]
      rate:
        model: GAMMA
      rate.params: 8.0 # alternative syntax to avoid extra indentation
    indel:
      model: 
        ins: POWER-LAW
        del: POWER-LAW
      params:
        ins: 1.01, 50
        del: 1.01, 50
      rate:
        ins: 0.01
        del: 0.01
      max:
        ins: 50.0
        del: 50.0

# regions - tying parts and rules creating as many regions as needed
#         - implicit inherits from previous section.
regions:
  region_name1:
    tree: tree_name1
    rule: rule_name1
    part: part_name1
    seg: 1
  region_name2:
    tree: tree_name1
    rule: rule_name1
    part: part_name2
    seg: 2
  region_name3:
    inherits: region_name1
    seg: 3
reedacartwright commented 1 year ago

scale option: I think we need one scale that is part of a tree and another scale that is part of a region. The first is useful for fixing a newick string without editing it. The second allows scale to varr between regions.

reedacartwright commented 1 year ago

I think it would be good to create some automated tests that check that the output of dawg + yaml is expected. We can do that in this pull request or another one after this one is merged?

jgarciamesa commented 1 year ago

Since we are migrating to doctest, I would prefer doing so before adding tests here. I already have work on a doctest branch which should be ready soon. I propose to leave this PR open until doctest is added, then update it. I would like code and tests to be part of the same PR. What do you think?

reedacartwright commented 1 year ago

I think we can merge before doctest is ready.

jgarciamesa commented 1 year ago

scale option: I think we need one scale that is part of a tree and another scale that is part of a region. The first is useful for fixing a newick string without editing it. The second allows scale to varr between regions.

Added scale option. If everything looks good should be ready to merge. I'll work on doctest next. Framework is done, but I'll add a battery of tests before opening the PR.