COSC1127-AI / pacman-contest-cluster

Script to run the Conquer The Flag PACMAN contest
http://ai.berkeley.edu/contest.html
Apache License 2.0
19 stars 4 forks source link

Enhance staff only option - New contest set-up #95

Closed ssardina closed 3 years ago

ssardina commented 3 years ago

We want to make staff only option more powerful for a new marking scheme:

Student teams will play only against say 30 staff teams and will collect points only against them.

ssardina commented 3 years ago

@AndrewPaulChester , could we start working on this feature? We discussed to do it last year but in the end we didn't manage to do it, but I don't want to last it to the last minute.

As we discussed with you and @nirlipo , we want to address 2 issues:

  1. being marked against other students (well, it is not truly correct)
  2. "unstable rankings" over the weeks (as other teams get better).

So, the idea is to be able to run all students against a set of staff teams only, and give points on that only. We agreed we will implement a tournament such that students play against a set of other teams (that remain stable across the weeks), and they do not play among them.

Now, I am thinking this is already done! Because we have a --staff-only or similar that @nirlipo added last year to just play against staffs. Wouldn't that be exactly that?

One thing is: can staff teams be given a folders? I think no, and that should be addressed, but it is much simpler.

otherwise we jump to the other issues..

Views?

AndrewPaulChester commented 3 years ago

Yes, I agree that this is largely already implemented in staff-only. A few thoughts:

  1. Currently you can set the staff-teams-dir, and it will grab all appropriately named zip files in that folder. This seems like good behaviour to me, why would we want to change it? I suppose we could just grab all subfolders instead, I don't see much of a difference.
  2. You mention above a grading scheme where each staff team is worth a different number of points - this will be the largest code change but is probably not too hard. I think it's worth doing as it gives us a lot of flexibility instead of having to come up with exactly the right numbers of agents of different difficulties.
  3. Does this ticket also cover actually designing the portfolio of agents? I think it would make sense to have only a handful (5?) really basic ones worth 10 marks each, to get people to a pass. We don't really care too much about fine distinctions at that level. Then perhaps another 25 worth ~2 points each to go from 50-100% so there is a lot more granularity.
ssardina commented 3 years ago
  1. Yes, we want to grab folders (as we do with teams) for staffs; it is too cumbersome to do the unpacking/packing when we want to change anything in a staff team. We just have the teams in "plain" format and that's it.
  2. I am not sure exactly how to use the points, rather than just taking 1 point per team, but as you said it would give flexibility. However, how do we do this well? Do we have a file with the list of staff teams to include and associated points?
  3. I would say so. How do we build a set of relevant staff teams. They cannot be few because we want to avoid students working to overfit a set of agents. By having many many, it is very hard to overfit.

We may also have a full set of staff teams, a pool, and then the system will "pick" randomly a handful and do a test. That would be the best I would say... And each staff team has to have a number of points associated for example.. All can be done automatically.

Great, let's solve this issue. this is THE issue we want to do @AndrewPaulChester and we want to do it well. Let's discuss exactly what we want, but I think are a one sync already.

AndrewPaulChester commented 3 years ago

This has evolved as discussed in our meeting recently. To summarise

ssardina commented 3 years ago

I think it all makes sense, let's get this done ASAP. This is all about generating the web page ultimately.

I am already building the rubric

ssardina commented 3 years ago

Excellent. We had a conversation today and we were on the same page. Now, let's unfold it!

I am almost finishing the spec.

AndrewPaulChester commented 3 years ago

Made a PR to address this, current output is as shown in the image: image

ssardina commented 3 years ago

Taht is pretty cool. I will check all this tonite, very curious!! Thanks!

ssardina commented 3 years ago

Done in PR #100