Enhance staff only option - New contest set-up

ssardina commented 3 years ago

We want to make staff only option more powerful for a new marking scheme:

[x] Allow staff teams to be in folders, not just zip files.
[x] Option to make staff teams play among them or not (default NO).
[x] Annotate each team with the score to give? so different teams will give different number of points?
[x] Build a pack of staff teams at different levels

Student teams will play only against say 30 staff teams and will collect points only against them.

ssardina commented 3 years ago

@AndrewPaulChester , could we start working on this feature? We discussed to do it last year but in the end we didn't manage to do it, but I don't want to last it to the last minute.

As we discussed with you and @nirlipo , we want to address 2 issues:

being marked against other students (well, it is not truly correct)
"unstable rankings" over the weeks (as other teams get better).

So, the idea is to be able to run all students against a set of staff teams only, and give points on that only. We agreed we will implement a tournament such that students play against a set of other teams (that remain stable across the weeks), and they do not play among them.

Now, I am thinking this is already done! Because we have a --staff-only or similar that @nirlipo added last year to just play against staffs. Wouldn't that be exactly that?

One thing is: can staff teams be given a folders? I think no, and that should be addressed, but it is much simpler.

otherwise we jump to the other issues..

Views?

AndrewPaulChester commented 3 years ago

Yes, I agree that this is largely already implemented in staff-only. A few thoughts:

Currently you can set the staff-teams-dir, and it will grab all appropriately named zip files in that folder. This seems like good behaviour to me, why would we want to change it? I suppose we could just grab all subfolders instead, I don't see much of a difference.
You mention above a grading scheme where each staff team is worth a different number of points - this will be the largest code change but is probably not too hard. I think it's worth doing as it gives us a lot of flexibility instead of having to come up with exactly the right numbers of agents of different difficulties.
Does this ticket also cover actually designing the portfolio of agents? I think it would make sense to have only a handful (5?) really basic ones worth 10 marks each, to get people to a pass. We don't really care too much about fine distinctions at that level. Then perhaps another 25 worth ~2 points each to go from 50-100% so there is a lot more granularity.

ssardina commented 3 years ago

Yes, we want to grab folders (as we do with teams) for staffs; it is too cumbersome to do the unpacking/packing when we want to change anything in a staff team. We just have the teams in "plain" format and that's it.
I am not sure exactly how to use the points, rather than just taking 1 point per team, but as you said it would give flexibility. However, how do we do this well? Do we have a file with the list of staff teams to include and associated points?
I would say so. How do we build a set of relevant staff teams. They cannot be few because we want to avoid students working to overfit a set of agents. By having many many, it is very hard to overfit.

We may also have a full set of staff teams, a pool, and then the system will "pick" randomly a handful and do a test. That would be the best I would say... And each staff team has to have a number of points associated for example.. All can be done automatically.

Great, let's solve this issue. this is THE issue we want to do @AndrewPaulChester and we want to do it well. Let's discuss exactly what we want, but I think are a one sync already.

AndrewPaulChester commented 3 years ago

This has evolved as discussed in our meeting recently. To summarise

We run a single fixed contest before the project starts with all the teams from the prior year. Based on that contest, we see how our benchmark teams did (staff team basic/medium/top/super). We calculate their rankings according to a score %, where 100% is the maximum you could get by winning every game.
Then, these benchmarks are used to set the marks for the rubric (e.g. if staff team medium should give 10/25 points, and staff team medium won 32% of its games, then a student team that wins 32% of the games will get 10/25 points. Linear interpolation between the staff teams (this could lead to a bit of a strange curve but should be ok).
From then on, the students only are run in tournaments against that same set of teams (e.g. all from the prior year). This makes the current leaderboard obsolete, as staff teams will play different numbers of games to student teams. To make this clear, we need to:
- [x] Remove staff teams from tournament leaderboard (through an option)
- [x] Add a column in table for score %, not just score, and sort by this column.
- [x] Add an optional argument in the command line which is a list of percentages. These will be highlighted rows inserted into the table to make it clear where the comparison points are for the staff benchmark teams.

ssardina commented 3 years ago

I think it all makes sense, let's get this done ASAP. This is all about generating the web page ultimately.

I am already building the rubric

ssardina commented 3 years ago

Excellent. We had a conversation today and we were on the same page. Now, let's unfold it!

I am almost finishing the spec.

AndrewPaulChester commented 3 years ago

Made a PR to address this, current output is as shown in the image:

ssardina commented 3 years ago

Taht is pretty cool. I will check all this tonite, very curious!! Thanks!

ssardina commented 3 years ago

Done in PR #100

COSC1127-AI / pacman-contest-cluster

Enhance staff only option - New contest set-up #95