Writes NeXus files from experiment data streamed through Apache Kafka. Part of the ESS data streaming pipeline.
-h,--help Print this help message and exit
--version Print application version and exit
--brokers TEXT Comma separated list of Kafka brokers.
--command-status-topic TEXT REQUIRED
Kafka topic to
listen for commands and to push status updates
to
--job-pool-topic TEXT REQUIRED Kafka topic to
listen for jobs
--graylog-logger-address URI
<host:port> Log to Graylog via graylog_logger
library
--grafana-carbon-address URI
<host:port> Address to the Grafana (Carbon)
metrics service.
-v,--verbosity Set log message level. Set to 0 - 5 or one of
`Debug`, `Info`, `Warning`, `Error`
or `Critical`. Ex: "-v Debug". Default: `Error`
--hdf-output-prefix TEXT <absolute/or/relative/directory> Directory which
gets prepended to the HDF output filenames in
the file write commands
--log-file TEXT Specify file to log to
--service-name [kafka-to-nexus:CI0021385-pid:18632-940b]
Used to generate the service identifier and as an
extra metrics ID string.Will make the metrics
names take the form:
"kafka-to-nexus.[host-name].[service-name].*"
--list_modules List registered read and writer parts of
file-writing modules and then exit.
--status-master-interval Interval between status updates. Ex. "10s".
Accepts "h", "m", "s" and "ms".
--time-before-start Pre-consume messages this amount of time. Ex.
"10s". Accepts "h", "m", "s" and "ms".
--time-after-stop Allow for this much leeway after stop time before
stopping message consumption. Ex. "10s".
Accepts "h", "m", "s" and "ms".
--kafka-metadata-max-timeout
Max timeout for kafka metadata calls. Note:
metadata calls block the application. Ex. "10s".
Accepts "h", "m", "s" and "ms".
--kafka-error-timeout Amount of time to wait for recovery from kafka
error before abandoning stream. Ex. "10s".
Accepts "h", "m", "s" and "ms".
--kafka-poll-timeout Amount of time to wait for new kafka message.
*WARNING* Should generally not be changed from
the default. Increase the
"--kafka-error-timeout" instead. Ex. "10s".
Accepts "h", "m", "s" and "ms".
--data-flush-interval (Max) amount of time between flushing of data to
file, in seconds. Ex. "10s". Accepts "h", "m",
"s" and "ms".
--max-queued-writes Maximum number of messages buffered for writing.
Directly affects the memory usage of the
application. The maximum is not enforced, only
used as guideline to throttle Kafka consumption.
Note that total memory usage will also depend on
the size of the actual messages consumed from
Kafka.
-X,--kafka-config KEY VALUE ...
LibRDKafka options
-c,--config-file Read configuration from an ini file
The file-writer can be configured from a file via --config-file <ini>
which mirrors the command line options.
For example:
brokers=localhost:9092
command-status-topic=command-topic
job-pool-topic=job-pool-topic
hdf-output-prefix=./absolute/or/relative/path/to/hdf/output/directory
service-name=this_is_filewriter_instance_HOST_PID_EXAMPLENAME
streamer-ms-before-start=123456
kafka-config=consumer.timeout.ms 501 fetch.message.max.bytes 1234 api.version.request true
Note: the Kafka options are key-value pairs and the file-writer can be given multiple such by appending the key-value pair to the end of the command line option.
Beyond the configuration options given at start-up, the file-writer can be sent commands via Kafka to control the actual file writing.
See commands for more information.
The supported method for installation is via Conan.
The following minimum software is required to get started:
Conan will install all the other required packages.
Follow the README here.
If you are not building on CentOS 7, set compiler.libcxx to libstdc++11 in the settings section of your Conan profile:
# Assuming profile is named "default"
conan profile update settings.compiler.libcxx=libstdc++11 default
From within the file-writer's top directory:
mkdir _build
cd _build
conan install .. --build=missing
cmake ..
make
There are additional CMake flags for adjusting the build:
-DRUN_DOXYGEN=ON
if Doxygen documentation is required. Also, requires make docs
to be run afterwards-DHTML_COVERAGE_REPORT=ON
to generate a html unit test coverage report, output to <BUILD_DIR>/coverage/index.html
From the build directory:
make UnitTests
./bin/UnitTests
When using Conan on OSX, due to the way paths to dependencies are handled,
the activate_run.sh
file may need to be sourced before running the application. The
deactivate_run.sh
can be sourced to undo the changes afterwards.
These tests run the core functionality of the program but without Kafka. This means the tests are more stable than the integration tests as they do not require Docker and Kafka to be running.
The integration tests consist of a series of automated tests for this repository that test it in ways similar to how it would be used in production.
See Integration Tests page for more information.
See the documentation
directory.
Please read CONTRIBUTING.md for information on submitting pull requests to this project.
This project is licensed under the BSD 2-Clause "Simplified" License - see the LICENSE.md file for details.