Open Lhorus6 opened 1 month ago
I've started a feed to reproduce, will let you know about the output
We compute the hash on the full file. We cant really do much on term of data control. To prevent too much works I currently try to not create any job is there is something already in the queue
I reproduced your issue @Lhorus6. So based on your comment @richard-julien , can I consider it as a "wont fix"?
Maybe it's not the hash calculation we have to play with, but there's something to be done in any case IMO. Here we're blowing up the ingestion queues
Julien said " To prevent too much works I currently try to not create any job is there is something already in the queue", so I guess he is testing possibilities for improvement
Yes. Testing PR opened here https://github.com/OpenCTI-Platform/opencti/pull/8617
Note: I think this issue is resolved by Julien's work. We tested a feed that was problematic and it's all good now.
Description
CSV feed import seems buggy or not optimized.
In my case, I have an import from the Blocklist.de source, which contains around 30K IPs. E.g. at this moment, we have 27K entries in the source:
However, just for this small source, I currently find myself with 2.36M bundles in the queue and tons of works.
Environment
OCTI 6.3.4
Reproducible Steps
Steps to create the smallest reproducible scenario:
Additional information
It seems to me that it only imports the data if the hash changes. So this source updates its file every 30 minutes? (because I have a work every 30min)
This seems unlikely, perhaps we have a bug in the hash generation that takes meta data as input? Just a guess
If the file does change continuously, maybe we shouldn't have to retrieve it every time, but just 2 times a day?