Living-with-machines / lwmdb

A django-based library for managing the Living with Machines newspapers metadata database schema
https://living-with-machines.github.io/lwmdb/
MIT License
2 stars 0 forks source link

Server deploy #71

Closed griff-rees closed 1 year ago

griff-rees commented 1 year ago

Timeline

🏁 The aim is to finish this production deploy by Feb 17.

Elements

kallewesterling commented 1 year ago

@griff-rees added a few things above - make sure I got that right.

Also wanted to add some more thoughts about the points above.

Next steps for me: What do you need from me? First dataset? I used this bespoke tool to generate the "fixture files" that can then be imported into the Db through Django, but we might want to go a different way here... We might also need to wait for the egress of the finalised metadata which @thobson88 is doing in the safe haven? Is there any indication for how long that might take, @thobson88 ?

Other opinions (take them or leave them):

The issue with the article title lengths, we might need to spend some more time with. I know that @thobson88 was of the opinion that we shouldn't snip them off at any given length. It makes sense to me, as we (a) might snip off things that actually article titles, and (b) the errors here aren't coming from us, but from the layout parsing back in the day, which means that it was classified as metadata at some point... So it really becomes a gray area, legally... (what is metadata? what do we have rights to, etc.)

I suggest that we don't do fulltext implementation for now, but push the metadata project through first, and then think of fulltext as a separate thing, which we can connect on, in a separate table or whatever later. (We might even want to think about an implementation of ElasticSearch with Django or something? This is obvious roadmap material -- for later /if we have time/other devs/etc..)

I will most definitely leave the server config stuff for you, if you're comfortable with that? I am not at all sure how to do it, but I think (to my mind) that it'd be easier to go with option 3, which I have managed to do on my local machine -- and then perhaps just tag on a password on the jupyter notebook so folks can log in to the Azure instance with it? Not sure if that's enough security for us, but since the metadata isn't a highly protected resource in the project, I think that'd be enough for a MVP.

griff-rees commented 1 year ago

Thanks @kallewesterling and sorry for my slow reply. I get the sense that a lot of this still needs to be discussed (topics for Monday, and @thobson88 if you've got a chance to summarise the safe haven situation that'd be really useful in prep for that chat).

But I think the easiest thing to respond to is "what I need":

And if it's helpful: I think of a database fixture as something that is minimally needed for the database to function. So: in my tests I generate minimal fixtures to run a query, avoid any more than that.

kallewesterling commented 1 year ago

I think of a database fixture as something that is minimally needed for the database to function. So: in my tests I generate minimal fixtures to run a query, avoid any more than that.

Question remains what minimally needed for database to function is here... To test the code, we can run with a very small sample set of data -- essentially just one issue of any given newspaper from our collections with its concomitant articles and that's it.

But we also, in parallel, need to think about the solution with a fully packed backend over the next few weeks. I don't know enough about Postgres to know how we can deliver a fully populated db via Docker to, say, an Azure VM to get it up and running... Perhaps you can guide me in the direction of good docs for that @griff-rees, if I should look into that?

kallewesterling commented 1 year ago

I suppose, in relation to that last comment, is also:

First dataset to deploy ;)

I take it that you need that from me, but in what format?

griff-rees commented 1 year ago

Local ssh access, with shared jupyter folder/permission access was deployed with specific IP addresses allowed via azure for security.