New output file for reactions

ASinanSaglam commented 1 year ago

This is an enhancement issue for a new output file that contains all reactions and their operations. The file format should contain:

A record of every reaction that fired during the simulation
Each reaction should specify the exact ID for each species associated with it
A record of the operations in every reaction

This issue will be used to track progress and discuss the exact file format used. The current plan is to set the file up as a JSON file.

ASinanSaglam commented 1 year ago

A very rough mock up for the file format. Still trying to decide on the exact format for reaction type. I'll update this as I work through it.

{
  "simulation": {
    "info": {
      "date": "10_20_2023",
      "parameter": 10
    },
    "reactions": [
      {
        "name": "rxn1",
        "type": "unidirectional",
        "reactants": [
          {
            "name": "A"
          },
          {
            "name": "B"
          }
        ],
        "products": [
          {
            "name": "C"
          },
          {
            "name": "D"
          }
        ],
        "rate_cts": 100
      }
    ],
    "firings": [
      {
        "name": "rxn1",
        "global_count": 101,
        "global_time": 12.32,
        "reactant_ids": [
          [
            1,
            67,
            3
          ],
          [
            23,
            46,
            33
          ]
        ],
        "product_ids": [
          [
            1,
            67,
            3
          ],
          [
            23,
            46,
            33
          ]
        ]
      }
    ]
  }
}

ASinanSaglam commented 1 year ago

As a side note, a key issue here is the exact method for outputting the JSON.

The current idea I have is to manually generate the JSON and not use any libraries since they will require a fully realized dictionary to output. This creates a linearly increasing memory requirement as reactions fire. We should first write out info and reaction classes and as the simulation proceeds, we should dump firings to avoid bloating memory requirements.

However, the frequency of dumping will impact performance, so I suggest making a buffer argument so that the output happens every N global events and we keep N events in memory only. This way the user can decide how to allocate their resources, either use less memory but get a bigger performance hit or more memory and less performance hit.

rasi commented 1 year ago

@ASinanSaglam Your above plan sounds good. Are you thinking that this will this be the default output format, or we turn on this detailed format based on a CLI parameter?

ASinanSaglam commented 1 year ago

@rasi I think this really needs to be an optional CLI parameter and wouldn't be turned on normally.

No matter how we tackle this, this type of extra output will impact performance one way or another (increased memory or impact on simulation speed). I'd like to keep that as a separate option entirely so that a regular user doesn't think that's the base performance.

ASinanSaglam commented 1 year ago

After talking to @jrfaeder about how to exactly record these, this is the updated output

{
  "simulation": {
    "info": {
      "testing": 123
    },
    "reactions": [
      {
        "name": "rxn1"
      }
    ],
    "firings": [
      {
        "id": "initiate",
        "global_count": 1,
        "global_time": 1.089967,
        "cpu_time": 0.224542,
        "operations": {
          "AddBond": [53,0,0,0],
          "StateChange": [0,0,1]
        }
      },
      {
        "id": "elongate_1",
        "global_count": 2,
        "global_time": 1.674751,
        "cpu_time": 0.224701,
        "operations": {
          "StateChange": [0,0,0],
          "DeleteBond": [53,0],
          "AddBond": [53,0,0,1]
        }
      },
      {
        "id": "initiate",
        "global_count": 3,
        "global_time": 1.977151,
        "cpu_time": 0.224799,
        "operations": {
          "AddBond": [43,0,0,0],
          "StateChange": [0,0,1]
        }
      },

I'm purely focusing on firing recordings right now, the rest is just there as placeholders. Same goes for the current operation formatting, though I just edited with a more recent one and I think it looks pretty ok. The operation names match the normal BNGXML output names now.

Each operation points to the specific molecule(s) undergoing that operation and the reactions block will eventually have a recording of each type so we can 1-1 reproduce what happened in the simulation. Each operation type needs to handle this slightly differently. I'm unsure how to tackle add/delete molecules operations here, working on that now.

Let me know if you have any comments on the format.

ASinanSaglam commented 1 year ago

Some quick notes I want to write down here in terms of performance.

This addition doesn't seem to have a noticeable impact on the original code if no output is selected, meaning that we didn't impact anything with these changes.
This doesn't seem to massively impact the memory management with 10k events or so buffer size.
No limit buffer seems to be a bad idea and having access to the buffer size is a useful tool.

Now a couple images. Release nfsim refers to the binary used in the current version of NFsim, no log refers to not using the output, nolim buff refers to a buffer without limit and buff_X refers to a buffer size of X events.

First, average of 10 runs of a model, identical seed: all_compare

Single runs, same seed, x axis is now the number of events: full_mem_comp

Finally some numbers on wallclock time

 NAME    AVG WALLCLOCK   STD WALLCLOCK
release_nfsim 13.0405 2.7962212448230916
no_log 9.8293 0.9871083071274398
buff_1 38.6115 7.864266084638794
buff_10 30.934000000000005 6.3880154664809625
buff_100 25.6676 6.026378235723344
buff_1000 29.7825 4.979970888469127
buff_10000 25.969099999999997 1.5209586746522732
buff_100000 22.166599999999995 5.198813522333726
no_lim_buf 21.6901 2.414058385789374

RuleWorld / nfsim

New output file for reactions #35