Open ngreve opened 6 months ago
As discussed today I think this may be expected behaviour. The timeslice distributer discards timeslices as long as no worker is connected.
To test if this is the issue, I recommend adding a large delay (sleep) to the mstool right after everything has been set up, but before the first data is written to the shared memory. During this time, the rest of the chain should be completely initialized.
Commit: cdc7893481aa10d51070d965ae9fc6eebfe91ac2
Problem Description
I am using the
mstool
to put a given microslice archive into a specified shared memory region to feed its content to a Flesnet entry node. (see Steps to reproduce) When using the-n
flag, Flesnet does not produce the specified amount of timeslices and exits early when running entry and build node in a combined process. When running build and entry node in dedicated processes, Flesnet behaves as expected and exits after the configured amount of timeslices.Steps to Reproduce
Producing a microslice archive file and providing it via shared memory using the
mstool
When running build and entry node in a combined process:
Notice the line
[09:51:37] INFO: total timeslices processed: 11
in contrast to the configured-n 15
in the Flesnet command. The amount of processed timeslices can vary across multiple runs.On the other hand, when running entry and build node in dedicated processes (order matters):
Firstly start the build node:
Secondly start the entry node:
Output of the build node:
Notice the line
[09:58:30] INFO: total timeslices processed: 15
- the expected 15 timeslices were built.If you switch the start order and first start the entry node and then the build node, then something similiar will happen to when you run both in a combined process - less timeslices getting built than configured.
Expected Behavior
In either case, dedicated processes or not, Flesnet should build the configured amount of timeslices. Furthermore I would expect that the start order of entry and build would not matter.
Additional Information
I am currently extending the
mstool
to verify produced timeslice archives using the input microslice archives (see draft PR ). Using my first prototypical implementations I've analyzed the contents of the produced timeslice archives. My findings explained using an example: Let's say you run Flesnet with a timeslice size of 100, and you want to create 15 timeslices but flesnet will stop after 11 as shown in this bug report. What you will find is, that the first microslice in your output timeslice archive will have the index 400. Which means that, for some reason 400 microslices, which are the 4 missing timeslices, are skipped but counted as produced timeslices anyway.