NSSAC / PEpiTA

Phase-based Epidemic Time series Analyzer.
MIT License
4 stars 2 forks source link

Use request.session storage instead of (most of) the current views.py global variables #8

Open bdklahn opened 10 months ago

bdklahn commented 10 months ago

I think the global variables are causing the issues of dataset pollution between sessions (e.g. even an additional private browsing tab, on the same computer).

https://github.com/NSSAC/PEpiTA/blob/36a2966b8a0d947c4bea6235a14ef5dcca12bd79/webdesign/polls/views.py#L22-L33

https://github.com/NSSAC/PEpiTA/blob/36a2966b8a0d947c4bea6235a14ef5dcca12bd79/webdesign/polls/views.py#L53

Global variables are generally a bad idea in any software engineering. It is almost always better/safer to explicitly pass things in, via function parameters, and utilize return values.

"Global variables allow you to share data across multiple functions, which can be useful in some situations. However, you should use this type of variable carefully and sparingly to avoid writing code that’s difficult to understand, debug, test, and maintain." https://realpython.com/python-use-global-variable-in-function/

I believe the current situation makes it possible for an uploadbutton branch . . . https://github.com/NSSAC/PEpiTA/blob/36a2966b8a0d947c4bea6235a14ef5dcca12bd79/webdesign/polls/views.py#L90 . . . in one session, to clobber the input_ts (and csvname, etc.) variable data of a subsequent evocation of a runbutton branch . . . https://github.com/NSSAC/PEpiTA/blob/36a2966b8a0d947c4bea6235a14ef5dcca12bd79/webdesign/polls/views.py#L136 . . . in another session.

Session data is already set up to utilize server tmp storage. Django is also already configured to have session variables available to html templates. e.g.

In views.py index(request) function, we could do things like . . .

sess = request.session
sess.setdefault('csvname', '')

(etc.)

In index.html template: { request.session.csvname }

So, we might not even need to construct a "context" dictionary, and pass it to the render function. Maybe just add anything useful to the session dictionary.

Apparently, though, a session storage update is only automatically triggered by key value changes at the top level. If the content of a nested container (e.g. request.session['some_dict']['val1'] = 'new_val') is changed, that will require a manual triggering of an update, by . . . request.session.modified = True So it might be easiest/safest to just try to put things as basic data types, at the top level (e.g. not make a nested context dictionary one of the session key values).

Any utility functions called from index() could be passed the session dictionary, and get and set things from that. e.g. here: https://github.com/NSSAC/PEpiTA/blob/36a2966b8a0d947c4bea6235a14ef5dcca12bd79/webdesign/polls/views.py#L321-L322

Does that make sense, @ajs997 (@srinivvenkat, @bryanleroylewis ) ?

It is probably even problematic to use a globally-accessable "figures" directory, under the "media" dir.

Note: Check the end of https://github.com/NSSAC/PEpiTA/issues/9#issue-1961993720 for maybe a better way of storing the input_ts data.

bdklahn commented 10 months ago

It occurs to me . . . It's probably best to have the code utilize any "global" state, as little as possible. -even if it is per session. I.e. ideally try and reduce the number of those original variables to 3 or less. Pass things into functions and utilize return values, whenever possible. It would be best to have things put in functions, with clear return values, rather than updating session variables as side-effects. We could still assign the result from calling functions, using returned tuples if necessary for multiple changes. But, probably, the only place even a session variable should be used is as a session dictionary being passed in to a function. Helper functions probably shouldn't update the session dictionary internally. The/a top level function should do that from the helper function's return values.

Again, we might use functools cache if we want/need some stuff loaded into memory. I would assume that if a DataFrame is passed into a function, it would only use the cached data for output (e.g. what's in the return value), if the DataFrame function parameter was referencing the exact same df object, in memory.