cbm-fles / flesnet

CBM FLES Timeslice Building
7 stars 22 forks source link

Verification of timeslice archives based on given input microslice archives #162

Open ngreve opened 5 months ago

ngreve commented 5 months ago

The goal of this feature is to be able to check if the created timslice archive file "makes sense" based on a given Flesnet input. For now, this is only intended to be used for small file sizes during the development process.

Example Usage

Using already existing features:

# Create a microslice archive file
./mstool -n 1000000 -p 0 -o ms_archive.msa
# Provide its contents through shared memory
./mstool --input-archive ./ms_archive.msa --output-shm fles_in_shared_memory

We will use the default timeslice size of 100 and overlap of 1 to create 15 timeslices

Start a build node:

./flesnet -n 15 -t zeromq -I shm://127.0.0.1/0 -o 0 -O shm:/fles_out_shared_memory/0 --processor-instances 1 -e "./tsclient -i shm:%s -o file:timeslice_archive.tsa"

Start an entry node:

./flesnet -n 15 -t zeromq -i 0 -I shm:/fles_in_shared_memory/0 -O shm:/fles_out_shared_memory/0 --processor-instances 1 -e "./tsclient -i shm:%s -o file:timeslice_archive.tsa"

When Flesnet has created the 15 timeslices, use the mstool for archive verification:

./mstool --input-archives ./ms_archive.msa --output-archives ./timeslice_archive.tsa --timeslice-cnt 15 --timeslice-size 100 --overlap 1 
[15:07:20] INFO: System provides 8 concurrent threads. Will use: 6
[15:07:20] INFO: Checking './timeslice_archive.tsa' against inputs ...
[15:07:20] INFO: Printing info for timeslice archive: ./timeslice_archive.tsa
[15:07:20] INFO: Timeslice cnt.: 15
[15:07:20] INFO: Microslices per timeslice: 101
[15:07:20] INFO: Components per timeslice: 1
[15:07:22] INFO: Checking './ms_archive.msa' against outputs ...
[15:07:24] INFO: Archive valid
[15:07:24] INFO: total microslices processed: 0
[15:07:24] INFO: exiting

Further testing needs to be done when using multiple input msa and multiple output tsa files.

Right now it is only checked if the tsa file contains the expected microslices from the given input microslice archive files and vice versa. The current version also uses basic parallelization by checking the available amount of threads of the system. It keeps 2 threads unoccupied to prevent blocking the whole system during development.

To-Dos and Possible Discussion Points

I've opened this draft PR as a platform for discussion about the needs and necessary capabilities for such a feature - the current state is very likely not feature complete and isn't bug free. I thought it would be a good idea to receive some feedback about this, before putting more sophisticated work into it.

Changes made (12.03.24):

ToDos/open for discussion:

cuveland commented 5 months ago

We have just gone through this PR superficially, and I think we should talk about it in the meeting. We have a few questions and comments, and otherwise this will be very long-winded.