Closed silverdaz closed 6 years ago
@blankdots : When you are back from holidays and read that, could you please give me a hand for the tests? (My local test worked fine).
The tests fail on the eureka, but I have neither touched the eureka code, nor its test. So either it failed before, or there is some magic happening in the mock things.
@silverdaz Could you, please, elaborate why not request the stable ID in the verify.py
, right after successful checksum validation? What is the necessity of introducing new microservice?
In my mind, the logic of verify.py
could be:
Will it work like that or am I missing something?
Ah, yes, sure, I had not thought about putting that code inside verify.py
.
But this is asynchronous: We do not "query" CentralEGA and wait for the response synchronously.
We instead send them a message and they respond when they find the time.
That's why there is another service that listens to their answer, and that verify
does not handle the stable ID.
Are we making our system more complicated just to make CEGA's life easier? :)
To be honest, I would even include stable ID querying to the ingest.py
, because I don't care much about wasting of some IDs: if we take a look at what UUID is, we can see that it's 128-bit long identifier, so you can imagine what a nearly endless space of IDs does it give to us.
And also I don't think that generation of ID is so much heavy and CPU consuming operation or that it's hard to implement: it's basically a call to UUID.randomUUID()
.
So, concluding, I would definitely not overload LocalEGA's setup with yet another microservice just because CentralEGA cannot or doesn't want to generate UUIDs for us upon request: neither them, nor we should be concerned about running out of ID's or about the heaviness of a synchronous approach.
Can you discuss this with Oscar and Sabela?
@dtitov we are talking about Stable IDs, not UUIDs. StableID have a different life cycle that include tracking them for creation, updates (with versions) and eventual deprecation or obsolescence. Thus, we prefer to do not "waste" them, if possible.
I'm not sure about what we are discussing here: async vs sync communication and CEGA IDs vs LEGA IDs (=UUIDs)? In my opinion, I don't see any benefit of using synchronous communication here. If CEGA is down what would you do? Retry until it is up again? What if there is a communication error? Retry again? How many times? And what happens if CEGA never answers? Retry? Or the other way around, what if LEGA requests an ID but something happens when answering back, what should CEGA do? Retry? We already have RabbitMQ in place, so let's use it :) It will guarantee messages are not lost and neither CEGA nor LEGA need this information immediately. About using UUIDs instead of CEGA IDs, I guess it's another solution but with probably lots of consequences which must be considered.
@jrambla Thank you for the clarification, I was not aware of StableIDs lifecycle - that, of course, makes things a bit more difficult.
At the moment, after your comment, I see two possible ways:
In other words, one of the sides should become more complicated. And, obviously, as a representative of LocalEGA team, I would prefer not adding any extra complexity to LocalEGA's architecture if it's possible to avoid that.
@sdelatorrep I see your point and it makes perfect sense to me, but any communication channel has its own benefits and downsides. What if we send you a message, you receive it and after that, during ID generation, CentralEGA goes down for some reason (e.g. power outage)? Then we will never receive the message back because, after the restart, your side will simply "forget" about the fact that it actually needs to respond (no messages are queued - no state is preserved). In this rare, but still possible case, the archived file will never get its ID. And with HTTP request we at least can be aware of the fact that ID retrieval failed so that we can fall back to some backup mechanism.
Look, I propose another variant not only because of my personal preferences but more because of practical reasons that relate to our local infrastructural limitations. But, I guess, we should better discuss it tomorrow in a meeting. Anyway, thanks for the explanations.
What if we send you a message, you receive it and after that, during ID generation, CentralEGA goes down for some reason (e.g. power outage)? Then we will never receive the message back because, after the restart, your side will simply "forget" about the fact that it actually needs to respond (no messages are queued - no state is preserved).
This is what RabbitMQ handles. The message is not forgotten, and it will be (re)processed.
This PR is a solution to issue #260 . There is always the possibility to make another PR/issue if you, @dtitov, feel that this solution is not suitable. In the Agile spirit.
@silverdaz I have read more about acknowledgments in RabbitMQ and yes, the case above can be handled.
@silverdaz regarding this https://github.com/NBISweden/LocalEGA/pull/341#issuecomment-418144459, you kinda touched eureka in the config (try
and try_interval
)
you can set them as env varisables in test using https://docs.python.org/3.6/library/test.html#test.support.EnvironmentVarGuard
According to Oscar, the stable IDs generation for CentralEGA is handled by a process at EBI and it depends on the content of the original file.
We use the generated stable ID after a successful ingestion, which allows us:
completed
, but not yetready
for data-out. Once the stable ID comes in, we flag the file asready
for download.