Stable IDs - Githubissues

silverdaz commented 6 years ago

According to Oscar, the stable IDs generation for CentralEGA is handled by a process at EBI and it depends on the content of the original file.

We use the generated stable ID after a successful ingestion, which allows us:

not to waste any useful stable ID in case of errors (and no need to recycle or clean them up)
mark the ingested file as completed, but not yet ready for data-out. Once the stable ID comes in, we flag the file as ready for download.

silverdaz commented 6 years ago

@blankdots : When you are back from holidays and read that, could you please give me a hand for the tests? (My local test worked fine).

The tests fail on the eureka, but I have neither touched the eureka code, nor its test. So either it failed before, or there is some magic happening in the mock things.

dtitov commented 6 years ago

@silverdaz Could you, please, elaborate why not request the stable ID in the verify.py, right after successful checksum validation? What is the necessity of introducing new microservice?

In my mind, the logic of verify.py could be:

Validate checksum: a. If validation failed - do the same what we do in such case currently; b. If validation succeeded - do the same what we do in such case currently plus query CentralEGA for stable ID and update it in our database.

Will it work like that or am I missing something?

silverdaz commented 6 years ago

Ah, yes, sure, I had not thought about putting that code inside verify.py.

But this is asynchronous: We do not "query" CentralEGA and wait for the response synchronously. We instead send them a message and they respond when they find the time. That's why there is another service that listens to their answer, and that verify does not handle the stable ID.

dtitov commented 6 years ago

Are we making our system more complicated just to make CEGA's life easier? :)

To be honest, I would even include stable ID querying to the ingest.py, because I don't care much about wasting of some IDs: if we take a look at what UUID is, we can see that it's 128-bit long identifier, so you can imagine what a nearly endless space of IDs does it give to us.

And also I don't think that generation of ID is so much heavy and CPU consuming operation or that it's hard to implement: it's basically a call to UUID.randomUUID().

So, concluding, I would definitely not overload LocalEGA's setup with yet another microservice just because CentralEGA cannot or doesn't want to generate UUIDs for us upon request: neither them, nor we should be concerned about running out of ID's or about the heaviness of a synchronous approach.

Can you discuss this with Oscar and Sabela?

jrambla commented 6 years ago

@dtitov we are talking about Stable IDs, not UUIDs. StableID have a different life cycle that include tracking them for creation, updates (with versions) and eventual deprecation or obsolescence. Thus, we prefer to do not "waste" them, if possible.

sdelatorrep commented 6 years ago

I'm not sure about what we are discussing here: async vs sync communication and CEGA IDs vs LEGA IDs (=UUIDs)? In my opinion, I don't see any benefit of using synchronous communication here. If CEGA is down what would you do? Retry until it is up again? What if there is a communication error? Retry again? How many times? And what happens if CEGA never answers? Retry? Or the other way around, what if LEGA requests an ID but something happens when answering back, what should CEGA do? Retry? We already have RabbitMQ in place, so let's use it :) It will guarantee messages are not lost and neither CEGA nor LEGA need this information immediately. About using UUIDs instead of CEGA IDs, I guess it's another solution but with probably lots of consequences which must be considered.

dtitov commented 6 years ago

@jrambla Thank you for the clarification, I was not aware of StableIDs lifecycle - that, of course, makes things a bit more difficult.

At the moment, after your comment, I see two possible ways:

CentralEGA creates and provides us with the "StableIDs API" (probably the REST one) so that we can perform all kinds of the operations that you've specified: issue new IDs, version them, deprecate them and so on.
Or LocalEGA creates yet another microservice to handle that (basically, this PR).

In other words, one of the sides should become more complicated. And, obviously, as a representative of LocalEGA team, I would prefer not adding any extra complexity to LocalEGA's architecture if it's possible to avoid that.

@sdelatorrep I see your point and it makes perfect sense to me, but any communication channel has its own benefits and downsides. What if we send you a message, you receive it and after that, during ID generation, CentralEGA goes down for some reason (e.g. power outage)? Then we will never receive the message back because, after the restart, your side will simply "forget" about the fact that it actually needs to respond (no messages are queued - no state is preserved). In this rare, but still possible case, the archived file will never get its ID. And with HTTP request we at least can be aware of the fact that ID retrieval failed so that we can fall back to some backup mechanism.

Look, I propose another variant not only because of my personal preferences but more because of practical reasons that relate to our local infrastructural limitations. But, I guess, we should better discuss it tomorrow in a meeting. Anyway, thanks for the explanations.

silverdaz commented 6 years ago

What if we send you a message, you receive it and after that, during ID generation, CentralEGA goes down for some reason (e.g. power outage)? Then we will never receive the message back because, after the restart, your side will simply "forget" about the fact that it actually needs to respond (no messages are queued - no state is preserved).

This is what RabbitMQ handles. The message is not forgotten, and it will be (re)processed.

This PR is a solution to issue #260 . There is always the possibility to make another PR/issue if you, @dtitov, feel that this solution is not suitable. In the Agile spirit.

dtitov commented 6 years ago

@silverdaz I have read more about acknowledgments in RabbitMQ and yes, the case above can be handled.

blankdots commented 6 years ago

@silverdaz regarding this https://github.com/NBISweden/LocalEGA/pull/341#issuecomment-418144459, you kinda touched eureka in the config (try and try_interval) you can set them as env varisables in test using https://docs.python.org/3.6/library/test.html#test.support.EnvironmentVarGuard

NBISweden / LocalEGA

Stable IDs #341