Open martijnroest opened 6 years ago
I am also having a similar issue of getting delayed data ingestion. What is more problematic is that I only receive the first 5-6 minutes of NSG log event for each hour. The # of event over time generates the saw blade pattern => |\ |\ |\ |\ |\ |\ __. I am currently running the ELK (6.1.3) with the latest azure blob plugin. Am I limited by some Azure side throttling? Any insight or help on this along with the delayed data ingestion would be very much appreciated.
Hi Guys,
I'm facing the same issue, when I look at the debug it looks like its digesting information from an Azure blob but it seems that it doesn't parse all logs.. It also feels like random behavirour, sometimes its spouts logs to elasticsearch and sometimes not.
Ok, so I have looked a little at the code ( I new to Ruby so it might not be accurate) and there seems to be two issues: 1) Only one blob is processed per interval (which is a problem if you have 100k+ blobs for example)
def run(queue)
#we can abort the loop if stop? becomes true
while !stop?
process(queue)
@logger.debug("Hitting interval of #{@interval}s . . .")
Stud.stoppable_sleep(@interval) { stop? }
end # loop
end # def run
#Start processing the next item.
def process(queue)
2) Every time one blob has been processed the registry is checked/rebuilt (also a problem if you have a large amount of files in your container)
# from process, getting the next blob
blob, start_index, gen = register_for_read
def register_for_read
begin
all_blobs = list_all_blobs
registry = all_blobs.find { |item| item.name.downcase == @registry_path }
candidate_blobs = all_blobs.select { |item| (item.name.downcase != @registry_path) }
Could someone with more Ruby experience confirm this? Is this by design (allowing multiple logstash working together on one container? I imagine that I would like to process all previously unknown blobs and then pause for interval before checking for new blobs. Creating a list/set of all blobs and only using it once to decide which blob to process seems like a huge waste.
Hi @MattiasFindwise,
Your observation is correct. The intent is this needs to work if there are multiple readers, using the azureblob input plugin, to read on the same container. The current solution is not the only solution, there are other possible solutions for handling concurrency. If you have an idea that's better, feel free to suggest.
logstash-input-azureblob served me well for low amount of files, but with large deployments I've seen all the error messages and symptoms, mainly reading one blob per 5 minutes (ascii drawing by hongtaejeon!). Thanks to the readable code and some ruby examples I could rewrite the plugin, but it's not backwards compatible. I now reused the logstash name with version 0.9.14, but due to incompatibilities, that is probably a bad bad bad idea. not sure if I should just split it of completely? I used a newer version of azure-storage-ruby and use Marshal for the registry. And there are parts of the original code I don't understand, I am very new to Ruby.
I see how supporting multiple readers is complicated, but maybe using separate file prefixes and using one registry per instance would eliminate the locking?
The code includes parsing of JSON nsgflowlogs and I intend to make it also work for IIS and generic codecs.
The code is here ... https://github.com/janmg/azure-diagnostics-tools/blob/master/Logstash/logstash-input-azureblob/lib/logstash/inputs/azureblob.rb I use the build script to compile the GEM and test using logstash.
I would appreciate feedback, opinions and comments in this thread or by mail jan at janmg dot com. That should help me to rewrite this into something meaningful to the community.
I've polished my code and because the config and registry are incompatible I decided to continue with my plugin not as a fork, but as a separate plugin in the same style as the azure_event_hubs.
https://github.com/janmg/logstash-input-azure_blob_storage
I published the 0.10.0 gem on rubygem, so it can be installed through
logstash-plugin install logstash-input-azure_blob_storage
I've polished my code and because the config and registry are incompatible I decided to continue with my plugin not as a fork, but as a separate plugin in the same style as the azure_event_hubs.
https://github.com/janmg/logstash-input-azure_blob_storage
I published the 0.10.0 gem on rubygem, so it can be installed through
logstash-plugin install logstash-input-azure_blob_storage
Hi @janmg, I tried many time to install or update the plugin, but the version still did not change (0.9.6). How can I select the version like 0.10.0 or 0.12.7 to install or update?
Any suggestion?
Many Thanks.
The environment is running on Logstash 6.2.4 with the azureblob plugin. The connection seems to work fine and data is coming in. However, it takes several hours before the data is picked up by logstash. What can be the cause of this delay?