Closed deyandyankov closed 3 years ago
Hi! Thanks for reporting the issue. Could you please specify version of Julia you are using?
hi @dfdx, thanks for the reply! I am using the following:
julia> versioninfo()
Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
julia>
I also observed this on Julia 1.6.0 on Windows before noticing that the package is not expected to work in Windows.
I have a number of hypotheses, but one thing that strikes me the most is holding the network object on the module scope:
module PubSub2
using RDKafka
import RDKafka.produce
emitter = RDKafka.KafkaProducer("localhost:9092") # <-- this
emit(key, payload) = produce(emitter, "sometopic", key, payload)
end # module
Julia will try to precompile your module, setting emitter
to whatever is available at the moment of compilation. Obviously, many things like pointers or open sockets will get invalid after loading the module in another Julia session.
The closest thing that comes to mind is to wrap the initialization code into __init__()
method, which is called automatically when a module is loaded, not when it's compiled:
module PubSub2
using RDKafka
import RDKafka.produce
const emitter_ref = Ref{Any}()
function __init__()
emitter = RDKafka.KafkaProducer("localhost:9092")
emitter_ref[] = emitter
end
emit(key, payload) = produce(emitter_ref[], "sometopic", key, payload)
end # module
I didn't check that the message is actually delivered, but at least this code doesn't segfault on my machine.
Thanks for looking into this on such a short notice. I believe you are right - I didn't think of when the emitter is initialized. I will try this tomorrow morning in a bit more detail. Meanwhile, I managed another workaround which also resides in a module:
function emit(producer, topic, key, message)
partition = -1
RDKafka.produce(producer, topic, partition, key, message)
end
function kafkaproduce()
@info "Loading all data"
producer = RDKafka.KafkaProducer("localhost:9092")
for filename in tqdm(readdir(rawdatadir))
@show filename
df = loadrawfile(filename)
for row in Tables.namedtupleiterator(df)
emit(producer, "quickstart-events3", string(row.time_msc), json(row))
end
end
end
This works without issues but ideally I would like to have the producer in module scope so that I can use it from different functions within the module. As I said - will try this tomorrow morning and will update with details.
This reminds me of a singleton pattern, something like this:
const PRODUCER = Ref{Union{KafkaProducer, Nothing}}(nothing)
function get_or_create_producer()
if PRODUCER[] === nothing
PRODUCER[] = KafkaProducer("localhost:9092")
end
return PRODUCER[]
end
function emit(topic, key, message)
producer = get_or_create_producer()
partition = -1
RDKafka.produce(producer, topic, partition, key, message)
end
This way you create producer only when it's needed and cache it for further usage. Note that long-living connection to Kafka may break due to network errors, in which case you still will need to re-initialize it, but that's the different story.
I tried this and it worked brilliantly. Thank you very much for your help! Should we close the issue?
Yeah, I don't think we can do much more about it. Please feel free to open new issues if you have any other questions.
I have created a simple package with the aim of holding a Kafka handler in module scope and emitting a message using an
emit()
method that uses the handle from module scope.I can share my
Project.toml
andManifest.toml
if necessary.When I run this, I get a segfault.
The interesting bit is, if I define the package within my julia session instead of having it installed for development, it works as expected:
I am running kafka using docker
wurstmeister/kafka
. Happy to provide further details. Your help is highly appreciated!