Open jmchilton opened 7 years ago
Or should we not bother until we have a data-based approach probably involving the GRT?
Ha, see my comment on gitter ;) @erasche and me talked about a project we would like to start as soon as he is here, which involves GRT, some smart prediction of resources, tools etc. If someone is faster and would like to work on it even better!
GRT is a good project - I told Anton when I started I wanted to do something like that and I have never gotten around to it. I'm wondering if there won't be cases where there is insufficient data or perhaps overfitting by automated approaches - perhaps coming up with some initial categories now and figure out how to map them to various machines would be good and then we could rely on GRT to correct or tweak the categorical assignments... I don't know though.
I was just thinking about this problem because https://github.com/galaxyproject/ansible-galaxy-extras/pull/150 is very cool - but we aren't assigning destinations very well without this data :D. I maybe don't have time to work on this - and should just wait and see what y'all come up with.
We have a master thesis for this, which tries to involve machine learning stuff on this data, maybe even on tools, kind of recommendation system. ping @anatskiy :)
Change the title and take over the issue. I guess we should deploy TPV by default.
Bjoern is taking over this issue and things we should integrate TPV into the Galaxy deployment.
It would be nice if we had default destination properties (runtime, memory, and cores) for all iuc, devteam, and other commonly used tools - along with a way to scale it for the laptop running docker-galaxy-stable at startup time.
Perhaps even categorizing everything into a runtime (
fast
,normal
,long
), a core count (single
,few
,many
), and memory (small
,normal
,large
) would get us a long ways down the road toward good scaling on a laptop and in like a vanilla cloud setting. We could provide additional refinements as they come up.