Question about this scheduler

ArchieGertsman / spark-sched-sim

A Gymnasium environment for simulating job scheduling in Apache Spark

MIT License

24 stars 2 forks source link

Question about this scheduler #6

Closed limengzhaolihai closed 7 months ago

limengzhaolihai commented 7 months ago

Respected author,

I read your paper very carefully and with great interest, so I have some questions. I would like to inquire about how the agent parameter, specifically for the DAGNNScheduler, is configured in the decima_tpch.yaml file, as mentioned in your codebase. In the provided YAML snippet, the agent is set as follows:

agent:
  agent_cls: 'DAGNNScheduler'  ##improvement
  embed_dim: 16
  num_encoder_layers: 4
  policy_mlp_kwargs:
    hid_dims: [64, 64]
    act_cls: 'Tanh'

Could you kindly provide further guidance on how this configuration aligns with the DAGNNScheduler parameters mentioned in your paper? I appreciate your detailed assistance on this matter.

In addition to the inquiry regarding the DAGNNScheduler configuration, I would be immensely grateful if you could provide insights into the configuration parameters specific to the DAGformer scheduler, if any. Your detailed guidance on this matter would be highly appreciated.

Warm regards.

ArchieGertsman commented 7 months ago

Hi! I actually removed DAGNNScheduler and DAGformerScheduler from this codebase (maybe you need to git pull), as those were purely experimental. I didn't have success with either one - DAGNN was too computationally expensive, and DAGformer never performed well.

I'd like to note that I was not an author of the paper "Learning Scheduling Algorithms for Data Processing Clusters" - that would be Hongzi Mao et al. These experimental models were not from their paper.

limengzhaolihai commented 7 months ago

Hi! I actually removed DAGNNScheduler and DAGformerScheduler from this codebase (maybe you need to git pull), as those were purely experimental. I didn't have success with either one - DAGNN was too computationally expensive, and DAGformer never performed well.

I'd like to note that I was not an author of the paper "Learning Scheduling Algorithms for Data Processing Clusters" - that would be Hongzi Mao et al. These experimental models were not from their paper.

HI! The title of the paper I mentioned is "A FASTER REINFORCEMENT LEARNING APPROACHAGE TO EFFICIENT JOB SCHEDULING IN APACHE SPARK". Thank you for your reply. I will consider it seriously. Thanks.

ArchieGertsman commented 7 months ago

Ah, my master's thesis. I left those models out of the paper as well. The original Decima architecture turned out to work well (at least for the TPCH dataset) with some hyperparameter tweaking. The final hyperparameters are in the config file, and they match what I documented in the paper.

limengzhaolihai commented 7 months ago

Okay, I got it. Great job mentioning it in your paper. Thanks for the reply.