arXiv / zzzArchived_arxiv-external-links

Clearinghouse for relations between arXiv e-prints and external resources
MIT License
4 stars 5 forks source link

Implement a storage service for external links #8

Open erickpeirson opened 5 years ago

erickpeirson commented 5 years ago

We need to implement a module for storing and retrieving external links for e-prints. It should go here: https://github.com/arXiv/arxiv-external-links/tree/develop/relations/services

We can use a SQL database for this, or something else. Let's discuss what to use before we get too far into the implementation.

We'll want to focus on a storage model that works well for #3 and #4

erickpeirson commented 5 years ago

@poad42 Want to take a look at this one? Will be informed by #1, but we can start thinking about what storage model we should use, and start wiring it up. For SQL-backed apps we have been using Flask-SQLAlchemy to work with MySQL/MariaDB. But open to ideas here

bonotake commented 5 years ago

To realize #4 with a good performance, I think it's better to settle another registry than the canonical Relation table; an entry contains an arxiv ID and the version, and only an "alive" relation ID. This is mutable and the application may delete an entry when it is superseded or suppressed.

bonotake commented 5 years ago

Here I'd like to ask a question; suppose a relation A was newly created. B was created to suppress A after that, and then C was created to suppress B. In this case, does A become active again?

bonotake commented 5 years ago

For the implementation of #4 (and #3), I am thinking of adding another table that contains two columns: relation IDs and boolean values that indicates the corresponding relation is active or not. This record is updated when a record creation occurred. The retrieval method only checks the table and determines if the relation is active. Is it OK?

erickpeirson commented 5 years ago

suppose a relation A was newly created. B was created to suppress A after that, and then C was created to suppress B. In this case, does A become active again?

That's a great question. I'll think out loud a bit on this.

One of the relevant drivers here is that we want an unambiguous representation of the provenance/history of relation information. The idea with modeling the relations themselves as immutable operations (add, replace, suppress) is that we don't have to to keep two separate models (the relations, and their history) in sync. The downside is that we can end up with some odd scenarios like the one you've posed.

Maybe the easiest way to deal with this is just to make some simple rules. I'm tempted to say that a relation that suppresses another relation cannot itself be suppressed or replaced.

We will need to implement some validation logic for relations. That might be a logical place to implement a rule of this kind.

erickpeirson commented 5 years ago

For the implementation of #4 (and #3), I am thinking of adding another table that contains two columns: relation IDs and boolean values that indicates the corresponding relation is active or not. This record is updated when a record creation occurred. The retrieval method only checks the table and determines if the relation is active. Is it OK?

It's a good idea, but I'm tempted to hold off on optimizations until we're a little further along.

bonotake commented 5 years ago

OK. I implemented #18 with the idea but will try to remove them and employ a straightforward implementation.