Open wischi-chr opened 3 years ago
Thanks for reporting 🙂
Yes. The test runners are run offline so do not have access to the internet.
cc @exercism/rust - It's probably worth changing the examples.
That's a tricky one: the point of the gigasecond
exercise is to teach how to use external crates. It may be possible to solve that exercise using the standard library (I haven't tried), but it will be both harder and less idiomatic.
Will there be a wide variety in crates used? If not, you could pre-install the most used packages in the test runner.
Given access to a database of student solutions, I could try to answer that question analytically and give a real answer. As things stand, I can only go by personal experience, which provides anecdotes, not data.
Given that caveat, I'd say that it's almost certainly possible to cover the majority of use cases with a careful selection of crates to include. There's precedent: the rust playground makes available the 100 most-downloaded crates.
That said, this is a bandaid for the online environment, not a real solution. There will always be students doing interesting and cool things which require crates outside the top N for any reasonable N that we include. I suggest that the website should present a notification in the case of this kind of error that the student attempt their solution on their local machine.
@coriolinus For the moment, you could have the test runner output a special message if a crate is missing and give instructions on what to do. That would make it clear to the student what they need to do.
I like both the idea of bunding commonly used crates into the test runner, and also the extra message. We could probably grep the existing solutions to see what crates people commonly use?
Given a large corpus of Cargo.toml
files, it's easy to get the list of the top 100 dependencies used:
rg 'dependencies' --glob Cargo.toml --files-with-matches |
xargs -L1 toml2json |
jq -r '.dependencies | keys | .[]' |
sort |
uniq -c |
sort -rn |
head -100
A variant of that command is likely to work well given the database of student solutions. However, the tricky part is that students may or may not upload their Cargo.toml
file to exercism.
There are additional greppable markers; extern crate foo;
used to be a reliable signal that the foo
crate was in use. However, that usage was phased out in the 2018 edition. These days, without Cargo.toml
, it's pretty hard to identify external crates using only CLI tooling.
@coriolinus https://github.com/exercism/groovy-test-runner/issues/3 got me thinking that we could host a mirror of crates.io with the top ~200 crates in it. Any reason that's a stupid idea?
The most straightforward response is that hosting a partial copy of crates.io makes life easier for us if the docker image is allowed to reach it, but doesn't do anything for us if the docker image isn't. If we're thinking of allowing the docker images firewalled access to a short list of external websites, then whitelisting crates.io and github.com will probably give us the same properties without requiring the effort of actually setting up a mirror.
I wouldn't want to give internet access as it opens us up to being used as an attack vector for other sites (ie we're used to DDOS them across test runs) and also it adds a financial risk as someone could run up a large AWS bill for data transfer outside of AWS (internal transfers are free/cheap - external transfers rack up). I also just don't want to give access to the internet on these machines as it's quite possible (probable?) that someone will break out of any containers we have any be able to change and restrictive rules we put in place.
But if hosting a partial copy would work, then that's a possibility we can look into (almost certainly post launch).
I guess I don't understand your intent. There's lots of ways bad actors could abuse an unlimited internet connection; I was thinking in terms of crypto mining, but I guess it depends on the attacker's needs. Point is, we agree that an unlimited connection is a bad idea.
It sounded to me like your plan goes like this:
If that were the plan, then my point was that whitelisting both crates.io and github.com would have the same risk profile, without needing us to go to the effort of setting up a crates.io clone. If that's not the plan, then I don't understand what the plan was supposed to be.
We can allow access from one AWS machine to another without allowing any internet access at all. So the only connections available to the machine (at the AWS level) is to a specific other machine.
This happened on the gigasecond exercise in rust: https://exercism.lol/tracks/rust/exercises/gigasecond/edit
Steps to reproduce