cronologic-de / xhptdc8_babel

Wrappers, Utilities and Examples for using the xHPTDC8 with various programming languages.
Mozilla Public License 2.0
0 stars 1 forks source link

Readout Tool #10

Closed sulimma closed 3 years ago

sulimma commented 3 years ago

Readout Tool

Objective

New Users shall have a command line tool where they can gather data directly from the device to test it without having to write specific code.

This shall reside in a new sub directory and be both a tool for end users and a code example for users.

Rust

This project shall be written in Rust. There is a command line parser for Rust called clap"´ that should make this very easy. Creating a wrapper interface for Rust seems to be rather simple: https://docs.rust-embedded.org/book/interoperability/c-with-rust.html

Output Format

There shall be two types of output data format selectable by a command line option:

binary -b

Just the content of the TDCHit structure bit by bit. So 96 bits per hit.

text based csv (default)

One line of text per hit, seperated by commas `time, channel, type, bin"

Output File Creation

The output is written to a filename given as the last parameter on the command line. This defaults to output.csv or output.dat

The number of hits per file shall be taken from a command line parameter -h and default to 10.000.

The number of files to be written shall be taken from the command line parameter -f and default to 1. If this parameter defaults to more than 1, an incrementing number is appended to the filename before the last dot.

A list of YAML files for the configuration shall be provided. Once for each -hparameter . The files shall be applied to the default config in the order that they are given.

Screen Output

At program start, the tool shall list the serial numbers of the boards found in the system. There shall be some kind of simple progress bar during acquisition. For example using indicatif.

Priority

Work on this issue shall only be started after the YAML reader has been implemented.

Command line examples

#read 10,000 hits into output.csv
readout 

#read 5,000 hits each into sample_0.dat and sample_1.dat with settings from config.yaml
readout -b -c config.yaml -h 5000 -f 2 sample.dat
Bassem-Ramzy commented 3 years ago

@sulimma can we exchange -h with -n for example, as -h is default of the help (=--help)?

Bassem-Ramzy commented 3 years ago

Is that OK?

USAGE:
    xhptdc8_readout.exe [FLAGS] [OPTIONS]

FLAGS:
    -b, --binary     The content of the TDCHit structure bit by bit, so 96 bits per hit. Default is csv.
        --csv        (Default) One line of text per hit, seperated by commas, "time, channel, type, bin"
    -f, --filesno    The number of files to be written. Default is 0.
    -n, --hitsno     The number of hits per file. Default is 10,000.
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
    -c, --config <YAML_FILE>    A list of YAML files for the configuration.
    -o, --output <FILE>         The file to which the output will be written. Default is "output.csv" 
Bassem-Ramzy commented 3 years ago

A list of YAML files for the configuration shall be provided. Once for each -h parameter . The files shall be applied to the default config in the order that they are given.

The reason we have "a list" (and not only one file) is that we have a configuration file per device? Please elaborate.

sulimma commented 3 years ago

'-h/-n' : OK

sulimma commented 3 years ago

reason for multiple YAML files:

We will provide example files for partial configuration. For example there might be one file that configures grouping for a certain application and another file that configures the input voltages for a given signaling standard.

Bassem-Ramzy commented 3 years ago

@sulimma The progress bar will progress only during writing the hits to the files, but not during the call of xhptdc8_read_hits, as the C function can't update it. We can set like 20% of the progress for the read function, so it's 0% before calling the function, then 20% after calling it, then progress with writing data to output files. Right? Any other ideas?

sulimma commented 3 years ago

xhptdc8_read_hits will allways return immediately (sort of), with or without data. There is no need for any update during that function. Have a look at the example from the user guide on a typical use case.

The simplest approach would be to do this after each call to read_hits(): hits_read += number_of_hits_returned_from_read_hits; and then update the progress bar.

But this results in a lot of unnecessary updates to the progress bar (up to 48 Million calls per Second). Depending on how inefficient the progress bar is written that might be a problem.

How about a separate thread the updates the progress bar every 50ms with the value of hits_read? That is complexity that is not really needed but Rust is said to have nice properties for multi threading that we could test.

sulimma commented 3 years ago

The number of hits to read will be in the millions in typical application and a few thousand with the dummy library (as it creates 1000 hits per second)

Bassem-Ramzy commented 3 years ago

@sulimma

  1. About when the readout tool stops reading from the device ( i.e. stops looping on calling read() ), will it be when the returned hits count is < buffer size? Or the MAX_TRYS_TO_READ_HITS = 1000?
  2. What should be the typical frequency of calling _read()? slightly < 1KHz to match dummy driver? How long to sleep if no reads are there?
  3. How to know the number of the total hits will be returned of all calls to _read(), to map it to the 100% of the progress bar? If not known, the progress can restart over again (from zero) when reaching 100% and will be only an indicator of progress
sulimma commented 3 years ago

Let me answer first for non-grouping mode.

1) The tool shall read until the total number of hits specified on the command line has been read (number_of_files*hits_per_file). In non grouping mode the number of hits in a file will always exactly match the number specified at the command line. A maximum number of tries is not suitable because there are applications with very low count rates (in astronomy there are applications in the range of 1 count per minute and there is one neutrino experiment with 10 counts per year. It is unlikely that they would use our tool, but I want to make the point that we can't make any assumptions about the hit rates other than the chip limit of 48 Million hits per second. It would be good if the tool could be interrupted with ctrl+c and behave nicely in that case (e.g. close opened files).

2) As in the example code: You read at full speed in a loop. Only if 0 is returned you do a sleep() to allow the PCI buffer to fill up a little. That is not necessary for the correct operation of the tool but it will increase system responsiveness by freeing up CPU resources if they are not needed.

3) The user specifies that a total of fn hits shall be read and the software knows how many hits have been read in total. hits_read/fn is the progress.

sulimma commented 3 years ago

In grouping mode there is a little complication because you might have read 9.999 hits so far and the next group contains 10 hits. The file shall always contain complete groups. So in this case you write a file with 10.009 hits instead of 10.000.

The difference will be very small in typical cases so I would just ignore this effect for progress calculations. It is OK if the program completes at 100.01%.

Bassem-Ramzy commented 3 years ago

Beta code is pushed, with the following output readout For CMD:

target\debug\xhptdc8_readout.exe -c C:\Temp\yaml2 C:\Temp\yaml1 -f=1

@sulimma, apart from the 99/100 issue, please let me know your comments

sulimma commented 3 years ago

Please provide examples for output files in both modes.

The screenshot is looking good. Minor improvements:

Bassem-Ramzy commented 3 years ago

Output file in no-group mode

target\debug\xhptdc8_readout.exe -c C:\Temp\yaml2 C:\Temp\yaml1

output.zip

Bassem-Ramzy commented 3 years ago

@sulimma Output files in no-group mode

target\debug\xhptdc8_readout.exe -c C:\Temp\yaml2 C:\Temp\yaml1 -f=3

readout.zip

Bassem-Ramzy commented 3 years ago

If no YAML file is provided, default configuration will be applied, right?

Bassem-Ramzy commented 3 years ago

Sample binary file for command:

target\debug\xhptdc8_readout.exe -c C:\Temp\yaml2 -b

output.zip

sulimma commented 3 years ago

BTW: This is a great tool if you want to look at binary files: https://www.sweetscape.com/010editor/

You can do binary regex searches or define patterns to format the display of binary data.

sulimma commented 3 years ago

If no YAML file is provided, default configuration will be applied, right?

Yes.

sulimma commented 3 years ago

@sulimma Output files in no-group mode

target\debug\xhptdc8_readout.exe -c C:\Temp\yaml2 C:\Temp\yaml1 -f=3

readout.zip

In the second and third file the timestamps do not increment. In the first file they do (the first one is correct).

Bassem-Ramzy commented 3 years ago

For binary mode, we got the following:

  1. Using format
    -c C:\Temp\yaml2 C:\Temp\yaml1 -f=10 -b -o BB -n 10000

    Time: Elapsed: Ok(45.8944742s), Slept: 27450 Code:

            output_value_to_write = format!("{:064b}{:08b}{:08b}{:016b}\n", 
                hits_buffer[hit_index].time,
                hits_buffer[hit_index].channel,
                hits_buffer[hit_index].type_,
                hits_buffer[hit_index].bin) ;
            match file.write_all(output_value_to_write.as_bytes()) {
                Err(why) => panic!("Couldn't write to file: {}", why),
                Ok(file) => file,
            }
  2. Using to_be_bytes
    -c C:\Temp\yaml2 C:\Temp\yaml1 -f=10 -b -o BStr -n 10000

    Elapsed: Ok(45.0601612s), Slept: 26810 Code:

            let time_as_bytes = hits_buffer[hit_index].time.to_be_bytes();
            match file.write_all(&time_as_bytes) {
                Err(why) => panic!("Couldn't write to file: {}", why),
                Ok(file) => file,
            }
            let channel_as_bytes = hits_buffer[hit_index].channel.to_be_bytes();
            match file.write_all(&channel_as_bytes) {
                Err(why) => panic!("Couldn't write to file: {}", why),
                Ok(file) => file,
            }
            let type_as_bytes = hits_buffer[hit_index].type_.to_be_bytes();
            match file.write_all(&type_as_bytes) {
                Err(why) => panic!("Couldn't write to file: {}", why),
                Ok(file) => file,
            }
            let bin_as_bytes = hits_buffer[hit_index].bin.to_be_bytes();
            match file.write_all(&bin_as_bytes) {
                Err(why) => panic!("Couldn't write to file: {}", why),
                Ok(file) => file,
            }
            match file.write_all(newline_as_bytes) {
                Err(why) => panic!("Couldn't write to file: {}", why),
                Ok(file) => file,
            }

    Conclusion:

  3. In 100K hits, it's almost the same time. With millions hits, it shall make a greater difference.
  4. Using format doesn't write binary data, but string representation of binary data, very well formatted and delimited. However, using to_be_bytes writes a real binary data, but need to investigate how to write leading zeros to have fixed with 96bytes entry per hit, or find out delimiter(s) (field & hit).

Compa

Bassem-Ramzy commented 3 years ago

BTW: This is a great tool if you want to look at binary files: https://www.sweetscape.com/010editor/

You can do binary regex searches or define patterns to format the display of binary data.

Very good one, I'm using mh-nexus as I needed only a hexadecimal viewer/editor.

sulimma commented 3 years ago

You are creating 1000 hits per channel with the dummy library, so it will always take about 50s to acquire 100.000 hit. That it takes less than 50s indicates that the timing in the dummy library is not accurate.

The problem with format is that it parses this string with a regular expression parser on each call: "{:064b}{:08b}{:08b}{:016b}\n". This takes at least 27 load instruction and compare instructions. Several thousand clock cycles at least.

to_be_bytes() on the other hand does nothing. It just tells the compiler to treat the data as a different type.

We need real binary data.

to_be_bytes() should always produce the same length. eg if x is 0 and of type i32 x.to_be_bytes() should produce four bytes.

Bassem-Ramzy commented 3 years ago

to_be_bytes() should always produce the same length. eg if x is 0 and of type i32 x.to_be_bytes() should produce four bytes.

Right, it should do that, but for a reason or another it doesn't, I'm checking it anyways

sulimma commented 3 years ago

from the specification of to_be_bytes(): "Return the memory representation of this integer as a byte array in big-endian (network) byte order." And the memory representation of integral types always is the same size.

Also: Do not write a new line. We want the structure to be the same size on disc as it is in memory (96 bits)

Bassem-Ramzy commented 3 years ago

Also: Do not write a new line. We want the structure to be the same size on disc as it is in memory (96 bits)

OK

Bassem-Ramzy commented 3 years ago

I added a command line flag -l to enable logging. Just making use of the messages (println!) I use while debugging.

sulimma commented 3 years ago

We do not need to catch errors on every call. Try this:


let hit = hits_buffer[index];
let mut pos = 0;
pos += file.write(hit.time.to_be_bytes())
pos += file.write(hit.channel.to_be_bytes())
pos += file.write(hit.type.to_be_bytes())
pos += file.write(hit.bin.to_be_bytes())
``´
and then check whether pos is 12.
sulimma commented 3 years ago

Unfortunately Rust aligns the structure to 128 bits. Otherwise you could do:

file.write(hits_buffer[lower..upper].to_be_bytes()

and be done.

sulimma commented 3 years ago

You should note in the readme file that binary data is stored in big endian format (network byte order).

Bassem-Ramzy commented 3 years ago

We do not need to catch errors on every call. Try this:

let hit = hits_buffer[index];
let mut pos = 0;
pos += file.write(hit.time.to_be_bytes())
pos += file.write(hit.channel.to_be_bytes())
pos += file.write(hit.type.to_be_bytes())
pos += file.write(hit.bin.to_be_bytes())
``´
and then check whether pos is 12.

For code:

        file.write_all(&(hits_buffer[hit_index].time.to_be_bytes())) ;

Getting compile time warning regarding to the Result return of the function:

unused `std::result::Result` that must be used

note: `#[warn(unused_must_use)]` on by default
note: this `Result` may be an `Err` variant, which should be handled

Still investigating...

Bassem-Ramzy commented 3 years ago

B1.zip Got binary file, looks correct For command:

xhptdc8_readout.exe -c C:\Temp\yaml2 -o B1 -f 1 -n 10000 -b

10,000 hits * 12 Bytes = 120KB file size No delimiter, nor new line. Size is OK

sulimma commented 3 years ago

Yes, it is correct. I had a look at it with this 010-editor template:

BigEndian();
while (1==1) {
struct HIT {
  uint64 time;
  uchar channel;
  uchar type;
  ushort bin;
} hit;
}

And it parses correctly.

Bassem-Ramzy commented 3 years ago

In grouping mode there is a little complication because you might have read 9.999 hits so far and the next group contains 10 hits. The file shall always contain complete groups. So in this case you write a file with 10.009 hits instead of 10.000.

The difference will be very small in typical cases so I would just ignore this effect for progress calculations. It is OK if the program completes at 100.01%.

Grouping is the only remaining feature, then it's just building x86, github action, and any tuning needed.

  1. I'm thinking about setting a threshold for this next group case, like 10% of file size. So, if a group when written to file will increase the file size more than 10% of its target size, then this group is moved to the next file if found, or dropped if not, with the proper info message displayed. What do you think?
  2. If one file has more than one group, how to define the start & end of every group? A delimiter? Or, that's not needed in the first place?
sulimma commented 3 years ago

There is no threshold needed. More later.

sulimma commented 3 years ago
  1. The group size is limited. In many cases it will be 2 or 8. Technically it could be 16kB (the size of the FPGA FIFO, but that is an unrealistic scenario and also would not be a problem. The file sizes will typically be hundreds of megabytes. So we can just ignore it.

sulimma commented 3 years ago

2.

I am not sure if that did make it into the dummy library yet: The first word of each group has channel number 255 and an absolute timestamp. So that data of a group will look like this:

1000000000, 255, 1, 0  #absolute timestamp of start hit
0, 0, 1, 0 # relative timestamp of start hit
5000, 1, 1, 0 # relative timestamp of stop hit
2000000000, 255, 1, 0  #absolute timestamp of start hit
0, 0, 1, 0 # relative timestamp of start hit
5000, 1, 1, 0 # relative timestamp of stop hit

The second hit is redundant, there might be an option in the future to ommit it.

Bassem-Ramzy commented 3 years ago

Yes, channel number 255 and an absolute timestamp is implemented lately in Dummy library. So, it's enough as a delimiter. No threshold nor delimiter are needed then.