Open erickpeirson opened 5 years ago
@poad42 Want to take a look at this one? Will be informed by #1, but we can start thinking about what storage model we should use, and start wiring it up. For SQL-backed apps we have been using Flask-SQLAlchemy to work with MySQL/MariaDB. But open to ideas here
To realize #4 with a good performance, I think it's better to settle another registry than the canonical Relation table; an entry contains an arxiv ID and the version, and only an "alive" relation ID. This is mutable and the application may delete an entry when it is superseded or suppressed.
Here I'd like to ask a question; suppose a relation A was newly created. B was created to suppress A after that, and then C was created to suppress B. In this case, does A become active again?
For the implementation of #4 (and #3), I am thinking of adding another table that contains two columns: relation IDs and boolean values that indicates the corresponding relation is active or not. This record is updated when a record creation occurred. The retrieval method only checks the table and determines if the relation is active. Is it OK?
suppose a relation A was newly created. B was created to suppress A after that, and then C was created to suppress B. In this case, does A become active again?
That's a great question. I'll think out loud a bit on this.
One of the relevant drivers here is that we want an unambiguous representation of the provenance/history of relation information. The idea with modeling the relations themselves as immutable operations (add, replace, suppress) is that we don't have to to keep two separate models (the relations, and their history) in sync. The downside is that we can end up with some odd scenarios like the one you've posed.
Maybe the easiest way to deal with this is just to make some simple rules. I'm tempted to say that a relation that suppresses another relation cannot itself be suppressed or replaced.
We will need to implement some validation logic for relations. That might be a logical place to implement a rule of this kind.
For the implementation of #4 (and #3), I am thinking of adding another table that contains two columns: relation IDs and boolean values that indicates the corresponding relation is active or not. This record is updated when a record creation occurred. The retrieval method only checks the table and determines if the relation is active. Is it OK?
It's a good idea, but I'm tempted to hold off on optimizations until we're a little further along.
OK. I implemented #18 with the idea but will try to remove them and employ a straightforward implementation.
We need to implement a module for storing and retrieving external links for e-prints. It should go here: https://github.com/arXiv/arxiv-external-links/tree/develop/relations/services
We can use a SQL database for this, or something else. Let's discuss what to use before we get too far into the implementation.
We'll want to focus on a storage model that works well for #3 and #4