exercism / v3-beta

12 stars 2 forks source link

[Page: Editor] [Rust] Exercise doesn't build (when external packages are required?) #26

Open wischi-chr opened 3 years ago

wischi-chr commented 3 years ago

This happened on the gigasecond exercise in rust: https://exercism.lol/tracks/rust/exercises/gigasecond/edit

Steps to reproduce

  1. Copy-Paste the .meta/example.rs into the editor
  2. Run the tests
We received the following error when we ran your code:
WARNING: student did not upload Cargo.lock. This may cause build errors.
error: no matching package named `chrono` found
location searched: registry `https://github.com/rust-lang/crates.io-index`
required by package `gigasecond v2.0.0 (/mnt/exercism-iteration)`
As a reminder, you're using offline mode (--offline) which can sometimes cause surprising resolution failures,
if this error is too confusing you may wish to retry without the offline flag.
iHiD commented 3 years ago

Thanks for reporting 🙂

Yes. The test runners are run offline so do not have access to the internet.

cc @exercism/rust - It's probably worth changing the examples.

coriolinus commented 3 years ago

That's a tricky one: the point of the gigasecond exercise is to teach how to use external crates. It may be possible to solve that exercise using the standard library (I haven't tried), but it will be both harder and less idiomatic.

ErikSchierboom commented 3 years ago

Will there be a wide variety in crates used? If not, you could pre-install the most used packages in the test runner.

coriolinus commented 3 years ago

Given access to a database of student solutions, I could try to answer that question analytically and give a real answer. As things stand, I can only go by personal experience, which provides anecdotes, not data.

Given that caveat, I'd say that it's almost certainly possible to cover the majority of use cases with a careful selection of crates to include. There's precedent: the rust playground makes available the 100 most-downloaded crates.

That said, this is a bandaid for the online environment, not a real solution. There will always be students doing interesting and cool things which require crates outside the top N for any reasonable N that we include. I suggest that the website should present a notification in the case of this kind of error that the student attempt their solution on their local machine.

ErikSchierboom commented 3 years ago

@coriolinus For the moment, you could have the test runner output a special message if a crate is missing and give instructions on what to do. That would make it clear to the student what they need to do.

iHiD commented 3 years ago

I like both the idea of bunding commonly used crates into the test runner, and also the extra message. We could probably grep the existing solutions to see what crates people commonly use?

coriolinus commented 3 years ago

Given a large corpus of Cargo.toml files, it's easy to get the list of the top 100 dependencies used:

rg 'dependencies' --glob Cargo.toml --files-with-matches | 
xargs -L1 toml2json | 
jq -r '.dependencies | keys  | .[]' | 
sort | 
uniq -c | 
sort -rn | 
head -100

A variant of that command is likely to work well given the database of student solutions. However, the tricky part is that students may or may not upload their Cargo.toml file to exercism.

There are additional greppable markers; extern crate foo; used to be a reliable signal that the foo crate was in use. However, that usage was phased out in the 2018 edition. These days, without Cargo.toml, it's pretty hard to identify external crates using only CLI tooling.

iHiD commented 3 years ago

@coriolinus https://github.com/exercism/groovy-test-runner/issues/3 got me thinking that we could host a mirror of crates.io with the top ~200 crates in it. Any reason that's a stupid idea?

coriolinus commented 3 years ago

The most straightforward response is that hosting a partial copy of crates.io makes life easier for us if the docker image is allowed to reach it, but doesn't do anything for us if the docker image isn't. If we're thinking of allowing the docker images firewalled access to a short list of external websites, then whitelisting crates.io and github.com will probably give us the same properties without requiring the effort of actually setting up a mirror.

iHiD commented 3 years ago

I wouldn't want to give internet access as it opens us up to being used as an attack vector for other sites (ie we're used to DDOS them across test runs) and also it adds a financial risk as someone could run up a large AWS bill for data transfer outside of AWS (internal transfers are free/cheap - external transfers rack up). I also just don't want to give access to the internet on these machines as it's quite possible (probable?) that someone will break out of any containers we have any be able to change and restrictive rules we put in place.

But if hosting a partial copy would work, then that's a possibility we can look into (almost certainly post launch).

coriolinus commented 3 years ago

I guess I don't understand your intent. There's lots of ways bad actors could abuse an unlimited internet connection; I was thinking in terms of crypto mining, but I guess it depends on the attacker's needs. Point is, we agree that an unlimited connection is a bad idea.

It sounded to me like your plan goes like this:

If that were the plan, then my point was that whitelisting both crates.io and github.com would have the same risk profile, without needing us to go to the effort of setting up a crates.io clone. If that's not the plan, then I don't understand what the plan was supposed to be.

iHiD commented 3 years ago

We can allow access from one AWS machine to another without allowing any internet access at all. So the only connections available to the machine (at the AWS level) is to a specific other machine.