Now NR-Scope uses multiple workers (threads) to asynchronously process each slot, thus we can increase throughput for high bandwidth processing. Changes are summarized as follow:
After synchronization with the cell (with or without the resampling), the task_scheduler will copy the slot data (1 TTI) and its global state (sib found? rach found? known dci list, etc.) to one of the workers. Thus the decoding cycle only incurs the delay of buffer copy and state values copy. If all the workers are busy (would normally only happen when the workers are initializing the SIB/RACH/DCI Decoders), the task_scheduler will insert an empty result into the queue to avoid blocking.
Each worker will wait for the new data and request for processing in a separate thread and push the result to a global result queue. Task scheduler has its own thread to grab the result from the queue and updates its global state according to the result (sibs found? new rach found? etc.). Task scheduler, workers and the queue has their own mutex lock to avoid waiting for one lock for all mutex operations.
CPU affinity option set in config.yaml. If enabled, the code will check if this setting is feasible by comparing the number of cores required with the number of cores that the machine has. The code will put worker thread, sib thread, rach thread and dci threads all to different cores. So the total number of required cores is worker_num * (1 (worker) + 1(sib) + 1(rach) + nof_dci_threads), e.g. 4 workers with 2 dci threads will require 4 * (3 + 2) = 20 cores. In the orion, for 40 MHz cell, 1, 2, 4 workers can work smoothly, but 5 or more workers would cause overflow. My guess is that consuming to many cores will make the resampling threads virtual threads. So putting the resampling worker numbers into the config.yaml and pinning the resampling threads to cores could be the next mini step.
Code style improvement. For better readability and convinence of maintenance, I changed the code style roughly according to the google c++ code style. Since the srsRAN's original code doesn't follow this guide, I think at lease we should follow the code style guide within ./nrscope/ directory. I already did that, so hopefully it's easier to start following the code style from now.
TODO:
A re-order buffer for the asynchronous results. Now the code just grabs and writes the new data from the queue into the csv file, and then the data can be re-ordered offline. I tried to hold the new results and wait for the correct next slot's result, but it seems to slow the states update progress. Will come back to solve this in near future.
Now NR-Scope uses multiple workers (threads) to asynchronously process each slot, thus we can increase throughput for high bandwidth processing. Changes are summarized as follow:
config.yaml
. If enabled, the code will check if this setting is feasible by comparing the number of cores required with the number of cores that the machine has. The code will put worker thread, sib thread, rach thread and dci threads all to different cores. So the total number of required cores isworker_num * (1 (worker) + 1(sib) + 1(rach) + nof_dci_threads)
, e.g. 4 workers with 2 dci threads will require 4 * (3 + 2) = 20 cores. In the orion, for 40 MHz cell, 1, 2, 4 workers can work smoothly, but 5 or more workers would cause overflow. My guess is that consuming to many cores will make the resampling threads virtual threads. So putting the resampling worker numbers into theconfig.yaml
and pinning the resampling threads to cores could be the next mini step../nrscope/
directory. I already did that, so hopefully it's easier to start following the code style from now.TODO: