Closed AlexHalb closed 1 year ago
Haven't looked at this yet, but a few notes:
eval
, which is terrible security wise. While ripping that out is a huge project (not only is there a bunch of work required to make it work on the site and EternaJS, but we'd also have the fallout of a ton of breaking scripts to deal with), Guy did start the process of making some improvements to the existing architecture that should be included. This included prompting the user with a warning when they are about to run a script which they haven't run before about the potential risks, a verified author system to avoid these messages for certain members, and a favoriting system which allows for bookmarking and a way to choose which boosters show in the game instead of the default f the latest 5 boosters you own. This included changes in the legacy frontend, backend, and EternaJS. I should probably get those changes merged so that you can incorporate them into your implementation.I've got the interface all but done now, just need to tweak a few things. Before I start working on the actual script evaluation, I've got a few questions. Here's my understanding of the types of scripts, what they are for, what code they have access to, etc.
document.getElementById('applet')
As far as RNA Scoring and Puzzle Solving: This functionality was done serverside (er, you could test it on the page itself by providing a puzzle ID or whatnot, but the intent was for our servers would run/benchmark them against all player puzzles, at least for the puzzle solving type), but this functionality has been defunct for years. Should we keep those scripts around at all? Archived? Hidden? Something else? Maybe one (or more) of @Omei @ElNando888 @rhiju has thoughts.
As far as the other types of scripts - yes, boosters are those used in the game, and etc those evaluated from within the interface (TBH, it might be nice to have additional categories to make it easier to find stuff, but we'd need to figure out what those categories would be). Now, as for the libraries/APIs they have access to, there's actually a few different components:
document.getElementById('maingame')
, only available to boosters (since boosters are run on a page where the game is loaded, and other scripts are not)Lib.fold
and Lib.energy_of_struct
. When on a page with the game present, they will use the EternaJS API via document.getElementById('maingame')
. However, when on the regular script page, the old implementation embedded a SWF called viennalib, which was, most likely, built from the code of the old Flash applet, specifically making available those two APIs for Vienna 1. These methods have actually been more or less broken for a while, because of the gradual phase-out of Flash by browsers (in addition to it being only Vienna 1 and no other models, and only those two methods). To support similar functionality, we need a custom build of EternaJS. I might even go so far as to say we might just want a headless version of EternaJS, but the APIs are broken enough and the code tightly coupled enough that it might not be worth it. At any rate, the key thing is that we need access to the folding code.TBH, the libraries could really bear getting redesigned, but that's a whole project in itself that will require some significant thought and coordination - and I don't really want to go down that route until we can do all the breaking changes to scripts at once, which will include:
Is there a reason we can't embed a random puzzle in an <iframe>
and use that? I've tried it, but I can't tell if it works because Chrome and Safari are mad at me for accessing the contents of the <iframe>
when it's cross-origin (locahost to eternadev). I could then set Lib
equal to the maingame
element of the <iframe>
and allow more functionality for etc scripts. This would also potentially allow boosters to be tested on the script page itself, without having to go to another puzzle page. It would also mean the script library is in one place, in EternaJS. Are there any drawbacks to this approach?
EDIT: I was able to access the maingame
element by using <object>
instead of <iframe>
- we'd probably want to change back to <iframe>
for release, but <object>
must not have restrictions on cross-origin like <iframe>
does, so I can use it to test.
Lib
is being used for a whole additional bunch of stuff, so I'd prefer having the applet stuff separately namespacedLib.fold(sequence, {engine: 'vienna2'});
than Lib.set_folder(...); Lib.set_sequence_string(...); Lib.get_sequence_string(...);
(and in fact, you still need to account for the fact that you want to fold an arbitrary sequence, not just a sequence that "fits" the given puzzle). EternaJS may already have APIs for this, but I can't rememberIn general, I think the option to show the full game with a puzzle of your choice on the script page for testing boosters is great, but I don't think that's the optimal method to provide the libraries. Also - at some point we'll likely move to having EternaJS as a dependency of the new frontend, and building it alongside, but there's a number of things I need to look into about that (both from a technical side and licensing side)
So I'll probably want to leave it up for boosters, but figure out an actual solution for etc scripts. I imagine I'll want to download the various folding engines and use them for the folding and structural energy computations. I had been thinking of something similar for functions such as Lib.fold()
, with an optional second parameter for the folding engine (the default would be the default folding engine - Vienna at the moment, I believe).
I've somehow gotten Vienna to work (I can fold a sequence programmatically), using the instructions from EternaJS. I should be able to do this for the other engines, too. From there, it shouldn't be too hard to write the API.
Other folding engines left to implement:
EternaFold and LinearFoldE aren't working because a file is missing - not sure why, as I got it (Eternafold) directly from the software downloads page on eternagame.org.
Library functions to add:
fold(sequence, engine)
energyOfStruct(sequence, structure, engine)
replace(sequence, index, to)
random(from, to)
randomSequenceWithBases(size, bases)
map(fn, sequence)
filter(fn, sequence)
splitDefault(structure)
join(array)
set(fn, structure)
distance(source, destination)
distanceCustom(fn, source, destination)
getStructure(nid, callback, error, workbranch)
EternaScript(id)
EternaScriptSource(id)
getPuzzles(params)
(Not sure why we need this and getUsers()
)getUsers(skip, size)
RNAElement
(all)RNA
(all)I've got this set up now as a Script
object. To run a script, you would use (new Script).evaluate(code, inputs)
. It returns an object containing the time it took to run, the result of the script, and a string representing the console. I currently have timeout set up as an input on the scripting interface on the new site. The only issue is finding an eval
alternative. I believe I've found a way to accomplish this through ~this npm package~ this one. I can use the postMessage
to inject the Library, RNA, RNAElement, and RNAException classes as well as the pervasives and other local variables. Once I've finished that, I'll need to clean up the code (add typings, etc) and then I'll publish it as a Github repository that you can fork. From there, I still need to
EDIT: I've used vm2 to allow safe execution and scripts. The script runs in a different context and only has access to the start time, library, and classes. The actual code isn't directly run - it is wrapped in a function (and run), and run through eval
to return the result in another function, which contains the pervasives and console. This function is run in the new context and returns the result (of the user code), the console, and the time taken to run the script. This is wrapped in a Promise
, which is then resolved with the result of the function from the new context. The outer function, Script.evaluate(code, input)
, returns this Promise
. This should be quite secure.
I've uploaded the script evaluation code to a Github repository here. I think I got everything set up correctly (with the emscriptem/cmake and all), but I might have kept or removed a few extra files.
Couple big issues here: 1) Code duplication. If we don’t build this as a separate entry point from EternaJS, the folding-related code should be in its own separate repo entirely. I want to do this at some point anyways, but will require some careful thought on it’s APIs and how it should interface with EternaJS as well as any external tools that might use it 2) As much as I applaud the sandboxing, this won’t work as-is without breaking scripts (and without providing alternatives). Namely, it can’t interact with our web API while being authenticated as the user, and it can’t interact with the DOM (in addition to boosters accessing the EternaJS API, some actually add content to the page, such as the mutation/submission booster and the codon constraint - some “etc” scripts also do this, though I’m less concerned about that)
I agree with you on code duplication - I already ran into issues with EternaFold and LinearFoldE that would have been resolved with a single source of truth.
With regard to the sandboxing, it might not be necessary at all. Users can only run scripts they write or they willingly copy from another user. And although it's certainly not good if someone accidentally runs a malicious script, it's also their fault. We definitely need to provide some sort of verification before you can copy a script to make this clear. Also, there isn't anything someone could do with the scripting interface that they couldn't already do with the JavaScript console. With this in mind, do you think we need to keep the sandboxing at all? It would be easier to work with eval
(especially when passing in variables), and it solves the DOM access problem.
That's actually essentially what Guy had previously built out, which I really need to get merged in. I do think it would ultimately be best to properly sandbox, both because it's not always easy to determine if something is malicious or not (and many users aren't familiar with scripting at all), and it's just not something users should have to worry about. The issue is that there's just too many moving parts to get it resolved relatively quickly.
I've got this set up now as a
Script
object. To run a script, you would use(new Script).evaluate(code, inputs)
. It returns an object containing the time it took to run, the result of the script, and a string representing the console...
Would that output string automatically update in asynchronous scripts (e.g setInterval
)? Is an event raised to let the page know that it has updated (so it can update the visual console)?
@guyguy2001 I hadn't thought about that - I'll need to take a look.
@luxaritas Perhaps sandboxing should be handled by the server, so it would just remove any 'bad' code when a script is updated? It's going to be difficult to allow scripts access to the DOM while sandboxing them (although perhaps there are a few workarounds I haven't thought of yet). If this was handled by the server, it would theoretically be safe to use eval
as any potentially harmful code would have been removed. Obviously the biggest issue is finding and removing bad code, which will take some work. What do you think?
There's no way to reliably do that - and at any rate, the things that we're trying to block by sandboxing have some legitimate and illegitimate use cases. EG, API access while authenticated, and modifying the DOM. So even if it was viable to detect usage of these features, there's no programatic way to determine if it's malicious or not. It needs to be fully sandboxed - my original plan was actually to put the user code in an iframe. We would then provide APIs to create UI extensions, authenticated API requests, EternaJS API requests, etc. (in the case of using an iframe, it would be communicated via postmessage)
As I mentioned before, this will still create issues of backwards compatibility and requires a fair amount of design consideration, so we probably just need to punt on that for now
So just do an eval
for now, give a warning message, and get everything working. Then we can focus on getting it working properly.
It would be really nice if we could make some sort of import syntax, so users could use other scripts without having to copy them (and risk them being outdated eventually) or make an API request to get the source code every time they run that piece of code. Probably would be something to bundle with the APIs you mentioned above.
There already is something like that - Lib.EternaScriptSource
Yes, but it retrieves the source code every time the script is run, rather than getting it once and storing it for future reference. The solution is probably just to modify Lib.EternaScriptSource
to cache scripts and check if it already has them stored before getting the code again.
I mean, if you use EternaScriptSource once at the beginning of your code, and just use the result of that, it won't cause it to get reevaluated. Regardless though, it uses the legacy frontend code for retrieving scripts, which does cache them :)
I've reworked how the console is handled. Now, instead of returning a console string, the evaluate
function accepts a callback that is called whenever something is printed to the console. On the website, this would be added to a variable and reactivity would handle the rest. In-game, this would probably go to console.log
Also, technically, all scripts are now asynchronous - they don't run until the library has loaded, which is async. Not sure if this was the same on the legacy site too.
Lastly, I've redone how folding engines are accessed. Instead of having to run the emscriptem/cmake commands to build everything, I just have it set up so the resulting JS files are dragged in. I am aware of the licensing restrictions on the folding engines - the files aren't pushed to Git. The idea is you either have a copy of the EternaJS repository and have already built the engines or you can go over there, follow the instructions, and build the engines - exactly what you would have to do anyway, just in another repository.
I'm not sure what you're referring to WRT console. In general though, I imagine the functionality should be identical to the current scripting interface?
By console, I mean the results of calling pervasives (out
, outln
, clear
, etc).
For the most part, it is exactly the same when tied in with the interface - currently, the results are prepended to a div
(for etc scripts; nothing happens for out
with boosters). This approach is slightly more flexible - it would be effectively the same except displaying/using the results of the console is handled by whatever would actually display the console. It would allow for out
to map to console.log
for boosters, which would be nice (especially when converting etc/puzzle solving scripts to boosters). It wouldn't cause any issues with backward compatibility.
With some slight modifications, I've successfully set up script evaluation on the website. Once I get the interface cleaned up, I'll move on to sandboxing the scripts.
I'm going to be taking this over in a fresh branch
Scripting interface moved to the new site. Almost completely functional except I'm still working on allowing scripts to be evaluated.
URLs:
/scripts
- A list of scripts/script/nid
- Views a specific script/create/script/nid
- Creates a new script if nonid
is given; if one is specified, it copies the script.