OneZoom / OZtree

OneZoom Tree of Life Explorer
Other
88 stars 18 forks source link

Tour DB CRUD #562

Closed lentinj closed 1 year ago

lentinj commented 2 years ago

We need a web2py controller for CRUD access to the Tours DB

lentinj commented 2 years ago

A first stab at this is above, which you can use to get a working tour.

Some points though:

hyanwong commented 2 years ago
  • It insists on being authenticated as a web2py user for now when uploading, presumably when it comes to making an editor we'd have to do something more complex here.

Good point. For the moment we aren't soliciting user-created tours. I think that's another project entirely. But when we do, I wonder if we want to use a different database and a different frontend, to keep it isolated from the main OneZoom DB. I don't know if that's possible or not, though.

  • Tourstops are currently being removed and recreated in bulk when we update. We have no unique identifier for a tourstop so doing an upsert of tourstops will get thorny rapidly. This is made worse by the many-many; this means we should have an identifier that's independent of the tour.

Hmm, a good point. I assume that the standard auto incremented id field created by web2py isn't sufficient?

  • We don't support tourstop.template_file, and I'm not entirely convinced it has a use case versus a the tourstop's "class" option, which is part of template_data. I'm tempted to ditch it until proven otherwise.

Was this so that we could have radically different HTML templates. E.g. a tour with only video, or a tour with different sorts of buttons, instead of forward / backwards, etc, or a full screen tour, etc.? As long as we maintain flexibility in the possible ways we could lay out a tour, I'm happy to ditch the particular scheme we previously used.

  • custom isn't a great name---the tutorial tour, which has it's own template, is surely much more of a custom tour than a generically-templated DB one is. However, we need a name for the controller.

Hmm, I'm not sure. I think of the tutorial as a "built-in" tour, whereas the templated ones are custom in the sense that different OneZoom instances might have different tours. But @jrosindell suggested tours/data/XXX, which I would be happy with too.

jrosindell commented 2 years ago
  • It insists on being authenticated as a web2py user for now when uploading, presumably when it comes to making an editor we'd have to do something more complex here.

Good point. For the moment we aren't soliciting user-created tours. I think that's another project entirely. But when we do, I wonder if we want to use a different database and a different frontend, to keep it isolated from the main OneZoom DB. I don't know if that's possible or not, though.

Regardless I think that the tour creator is something that a user would have to log in for anyway. I know logins are annoying, and I'm pleased we've managed to avoid needing them to sponsor and renew, but I think the functionality of a web based tour creator would be far too constrained without having the creator log in - they would obviously be a new 'type' of user and its nice because I think in the first instance we'd probably make it that people had to 'apply' to be a tour creator.

jrosindell commented 2 years ago
  • Tourstops are currently being removed and recreated in bulk when we update. We have no unique identifier for a tourstop so doing an upsert of tourstops will get thorny rapidly. This is made worse by the many-many; this means we should have an identifier that's independent of the tour.

Hmm, a good point. I assume that the standard auto incremented id field created by web2py isn't sufficient?

Agreed that this is very much thorny. I'd like to hear what @lentinj thinks of using the id, but my fear is that it's not stable enough to future rebuilds, edits and deletes of tours and of tour stops. I have a few thoughts.....

I wonder if some hash of the tourstop data could be used to check if we have that tourstop already stored and not create a duplicate. Even that feels messy though because if someone corrects a spelling mistake in a tourstop then that would change the hash and make other uploaded tours using that stop think that it's not there. Maybe that's something we accept and the effects of it could be mitigated by letting the original hash of a tour stop stick around, or by storing pairs of hashes somewhere else that represent an old and new version of the same tour stop.

lentinj commented 2 years ago
  • It insists on being authenticated as a web2py user for now when uploading, presumably when it comes to making an editor we'd have to do something more complex here.

Good point. For the moment we aren't soliciting user-created tours. I think that's another project entirely. But when we do, I wonder if we want to use a different database and a different frontend, to keep it isolated from the main OneZoom DB. I don't know if that's possible or not, though.

Regardless I think that the tour creator is something that a user would have to log in for anyway. I know logins are annoying, and I'm pleased we've managed to avoid needing them to sponsor and renew, but I think the functionality of a web based tour creator would be far too constrained without having the creator log in - they would obviously be a new 'type' of user and its nice because I think in the first instance we'd probably make it that people had to 'apply' to be a tour creator.

Yeah, this seems like the direction we're going in one way or another, e.g. community translations.

jrosindell commented 2 years ago
  • We don't support tourstop.template_file, and I'm not entirely convinced it has a use case versus a the tourstop's "class" option, which is part of template_data. I'm tempted to ditch it until proven otherwise.

Was this so that we could have radically different HTML templates. E.g. a tour with only video, or a tour with different sorts of buttons, instead of forward / backwards, etc, or a full screen tour, etc.? As long as we maintain flexibility in the possible ways we could lay out a tour, I'm happy to ditch the particular scheme we previously used.

It was for having radically different HTML templates yes, and I'm happy to ditch it too as I guess that we'll still be able to provide a fair range of layout options with the new scheme

  • custom isn't a great name---the tutorial tour, which has it's own template, is surely much more of a custom tour than a generically-templated DB one is. However, we need a name for the controller.

Hmm, I'm not sure. I think of the tutorial as a "built-in" tour, whereas the templated ones are custom in the sense that different OneZoom instances might have different tours. But @jrosindell suggested tours/data/XXX, which I would be happy with too.

Hmmm maybe I wasn't clear on what the tours/data/XXX is a home for - I thought it was for all the main tours but if its for the built in one like tutorial then I do prefer 'built-in' to 'data'

lentinj commented 2 years ago
  • Tourstops are currently being removed and recreated in bulk when we update. We have no unique identifier for a tourstop so doing an upsert of tourstops will get thorny rapidly. This is made worse by the many-many; this means we should have an identifier that's independent of the tour.

Hmm, a good point. I assume that the standard auto incremented id field created by web2py isn't sufficient?

Agreed that this is very much thorny. I'd like to hear what @lentinj thinks of using the id, but my fear is that it's not stable enough to future rebuilds, edits and deletes of tours and of tour stops. I have a few thoughts.....

Exactly, it's not going to be stable across databases. So we couldn't include ids in the current edge species JSON, e.g., or reliably dump it and re-upload into a separate instance.

I wonder if some hash of the tourstop data could be used to check if we have that tourstop already stored and not create a duplicate. Even that feels messy though because if someone corrects a spelling mistake in a tourstop then that would change the hash and make other uploaded tours using that stop think that it's not there. Maybe that's something we accept and the effects of it could be mitigated by letting the original hash of a tour stop stick around, or by storing pairs of hashes somewhere else that represent an old and new version of the same tour stop.

Yeah, the only way you'd make this work is if tourstops had externally-assigned UUIDs---i.e. you'd have to add them for each tourstop in the JSON. Which would make writing the documents a bit uglier, but not massively raising the bar.

The other option I'm thinking of is something like:

This I think is modelling what we actually want to do (at least IMO). The tourstop_name is basically the same as the UUID admittedly, but as it's only unique within a tour, you can make up a value easily enough.

lentinj commented 2 years ago
  • We don't support tourstop.template_file, and I'm not entirely convinced it has a use case versus a the tourstop's "class" option, which is part of template_data. I'm tempted to ditch it until proven otherwise.

Was this so that we could have radically different HTML templates. E.g. a tour with only video, or a tour with different sorts of buttons, instead of forward / backwards, etc, or a full screen tour, etc.? As long as we maintain flexibility in the possible ways we could lay out a tour, I'm happy to ditch the particular scheme we previously used.

It was for having radically different HTML templates yes, and I'm happy to ditch it too as I guess that we'll still be able to provide a fair range of layout options with the new scheme

Yeah, I'm worried that we're going to end up in situations where features need to be (or forgotten to be) backported into all the custom templates, and it's going to be more maintainable long-term if everything uses the same one.

  • custom isn't a great name---the tutorial tour, which has it's own template, is surely much more of a custom tour than a generically-templated DB one is. However, we need a name for the controller.

Hmm, I'm not sure. I think of the tutorial as a "built-in" tour, whereas the templated ones are custom in the sense that different OneZoom instances might have different tours. But @jrosindell suggested tours/data/XXX, which I would be happy with too.

Hmmm maybe I wasn't clear on what the tours/data/XXX is a home for - I thought it was for all the main tours but if its for the built in one like tutorial then I do prefer 'built-in' to 'data'

We currently have the following URLs for tours generated with their own controller/template, and not in the DB:-

...and for tours in the database:-

The structure isn't really negotiable (i.e. nesting the built-in tours further), but the word "custom" in the above is.

jrosindell commented 2 years ago
  • Tourstops are currently being removed and recreated in bulk when we update. We have no unique identifier for a tourstop so doing an upsert of tourstops will get thorny rapidly. This is made worse by the many-many; this means we should have an identifier that's independent of the tour.

Hmm, a good point. I assume that the standard auto incremented id field created by web2py isn't sufficient?

Agreed that this is very much thorny. I'd like to hear what @lentinj thinks of using the id, but my fear is that it's not stable enough to future rebuilds, edits and deletes of tours and of tour stops. I have a few thoughts.....

Exactly, it's not going to be stable across databases. So we couldn't include ids in the current edge species JSON, e.g., or reliably dump it and re-upload into a separate instance.

I wonder if some hash of the tourstop data could be used to check if we have that tourstop already stored and not create a duplicate. Even that feels messy though because if someone corrects a spelling mistake in a tourstop then that would change the hash and make other uploaded tours using that stop think that it's not there. Maybe that's something we accept and the effects of it could be mitigated by letting the original hash of a tour stop stick around, or by storing pairs of hashes somewhere else that represent an old and new version of the same tour stop.

Yeah, the only way you'd make this work is if tourstops had externally-assigned UUIDs---i.e. you'd have to add them for each tourstop in the JSON. Which would make writing the documents a bit uglier, but not massively raising the bar.

The other option I'm thinking of is something like:

  • Tours <-> Tourstops is 1:many instead of many:many, i.e. the 2 tables link directly without the tourorder indirection, and a Tourstop will have a "home" tour.
  • Tourstops have a tourstop_name
  • We have a "tourstop_symlink" column, that internally is a tourstop_id, but in JSON documents is represented as (tour)/(tourstop_name).
  • When loading tours, we populate symlink tourstops with the contents of their parent.

This I think is modelling what we actually want to do (at least IMO). The tourstop_name is basically the same as the UUID admittedly, but as it's only unique within a tour, you can make up a value easily enough.

I like this - it means when you make a tour and reuse an existing tour stop you place yourself at the mercy of possible changes to the tour stop you're reusing , just as you do with placing a URL to your images (as we've agreed will be the case)

One question, where is the order of the stops stored then - in the tour itself or in an additional table? There would still need to be a place that the stops are ordered within a tour and the 1:many idea only gets rid of the problem of who owns a stop for the case of uploading and downloading tours as JSON.

jrosindell commented 2 years ago

Yeah, I'm worried that we're going to end up in situations where features need to be (or forgotten to be) backported into all the custom templates, and it's going to be more maintainable long-term if everything uses the same one.

I see what you mean - maybe we do just accept that tutorial is not a custom layout and just a tour like any other. I can see that this would give us an additional test case and be more elegant in the longer term.

jrosindell commented 2 years ago

The structure isn't really negotiable (i.e. nesting the built-in tours further), but the word "custom" in the above is.

I prefer 'data' to 'custom' then

lentinj commented 2 years ago

Yeah, I'm worried that we're going to end up in situations where features need to be (or forgotten to be) backported into all the custom templates, and it's going to be more maintainable long-term if everything uses the same one.

I see what you mean - maybe we do just accept that tutorial is not a custom layout and just a tour like any other. I can see that this would give us an additional test case and be more elegant in the longer term.

I don't think this would be particularly hard. Some more seams would need to be loosened, in particular:

..however, I don't think it's worth doing beyond the mental exercise. We'd need to store the images somewhere else (we don't want to allow HTML in the database), so would end up storing those in the repo separately, but the tutorial would be in the database (or a JSON lump somewhere to import, probably alongside the images). It'd be much harder to maintain than it is now.

jrosindell commented 2 years ago

@lentinj one question.... where is the order of the stops stored under your new proposal? In the tour itself or in an additional table? There would still need to be a place that the stops are ordered within a tour .... the 1:many idea only gets rid of the problem of who owns a stop for the case of uploading and downloading tours as JSON.

lentinj commented 2 years ago

I think (further from above), that tourstop & tourstop_symlink are separate tables. They both have tour/ord columns, and we merge+sort the contents together when reconstituting the tour.

Ord & identifier are very much separate though; ord is implicit in the tourstop ordering in the document, and identifier gives you a hint it's an old one that's moved, versus a new tourstop.

The edge species tour also has a comment before each stop anyway, the contents of which are perfect for the identifier.

On 27 May 2022 16:20:02 BST, James Rosindell @.> wrote: @. one question.... where is the order of the stops stored under your new proposal? In the tour itself or in an additional table? There would still need to be a place that the stops are ordered within a tour .... the 1:many idea only gets rid of the problem of who owns a stop for the case of uploading and downloading tours as JSON.

-- Reply to this email directly or view it on GitHub: https://github.com/OneZoom/OZtree/issues/562#issuecomment-1139717444 You are receiving this because you were mentioned.

Message ID: @.***>

lentinj commented 2 years ago

The above might be a bit rough, and doesn't implement the tour symlinks, but makes the required DB changes, and can hopefully see where it's going.

edge_species has also been updated with identifiers, which give a convenient place to store what was comments about the OTT.

lentinj commented 1 year ago

What should happen when a symlink's source tourstop is deleted? As PyDAL defaults to CASCADE, currently we'd remove the symlink entry as well as the source tourstop. This might be reasonable enough for now.

We could "orphan" tourstops if they still have references to them (i.e. NULL their associated tour and identifier), but I think the associated mess to garbage collect here isn't worth it.

lentinj commented 1 year ago

Anyway, I've got something that looks like it works locally for symlink tourstops, filling in the TODOs from the above commit. Adding some unit testing around the DB endpoint before declaring this done.

lentinj commented 1 year ago

The above adds symlink support and tests (and fixes) the CRUD API endpoint. The main debatable part is whether CASCADEing deletes with symlinks is the right thing to do, or whether we shouldn't be able to delete tourstops that are used elsewhere.

jrosindell commented 1 year ago

@lentinj this all looks good. I think I follow now.

Regarding the symlinks if I understand correctly each tourstop has a home tour but can also be referenced from other tours that might be a 'best of' concept or part of a collection with different angles on the same content.

I think it's fine that if we delete a whole tour (together with all tour stops that it is home to) then the symlinks from other tours disappear too. Hence those other tours would still work but they would simply lose the deleted stops. I guess that what matters is the ordering of the ord field and so having 1,2,3,5,6 (because 4 is a deleted symlink) will make no difference other than we'll have one fewer stop.

Another reason why I think this is OK is that if we REALLY want to avoid losing an important stop that's currently only a symlink then we could clone it in the DB let the copy have a different home tour.

One outstanding question is where the richer functions e.g. multiple choice question on a tour stop, or transition that requires a treasure hunt, would be handled.

lentinj commented 1 year ago

I guess that what matters is the ordering of the ord field and so having 1,2,3,5,6 (because 4 is a deleted symlink) will make no difference other than we'll have one fewer stop.

Yes, this is why we're naming stops, so there's no need to expose the ord, so it's of no consequence if the ord ends up with a gap.

One outstanding question is where the richer functions e.g. multiple choice question on a tour stop, or transition that requires a treasure hunt, would be handled.

Won't really be a problem. The only difference here is instead of going to the "next" tourstop we go to a named one in the tour. Obviously if the named tourstop has since been deleted that's bad, but there's nothing else to do in this situation than error anyway.