CSUS-LLVM / OptSched

Optimizing scheduler. Combinatorial instruction scheduling project.
GNU General Public License v3.0
19 stars 17 forks source link

PERP -> SLIL: large number of mismatchs #66

Closed Quincunx271 closed 4 years ago

Quincunx271 commented 4 years ago

There is a large number of mismatches between PERP and SLIL. I'm seeing somewhere around 577861 mismatches over FP in CPU2006 (that was with history domination turned off). This is my sched.ini:

# Use optimizing scheduling
# YES
# NO : No scheduling is done.
# HOT_ONLY: Only use scheduler with hot functions.
USE_OPT_SCHED YES

# Print spill counts
# Same options as use optimal scheduling.
PRINT_SPILL_COUNTS YES

# Use two pass scheduling approach. 
# First pass minimizes RP and second pass tries to balances RP and ILP.
# YES
# NO
USE_TWO_PASS NO

# These 3 flags control which schedulers will be used.
# Each one can be individually toggled. The heuristic
# list scheduler or ACO must be run before the 
# enumerator.
# VALUES:
# YES
# NO
# HEUR_ENABLED is the Heuristic scheduler.
HEUR_ENABLED YES
# ACO_ENABLED is the Ant Colony Optimization scheduler.
ACO_ENABLED NO
# ENUM_ENABLED is the Branch and Bound scheduler.
ENUM_ENABLED YES

# Controls when ACO should be run, either before or
# after the enumerator. Both can be enabled at the
# same time. ACO is disabled if both are disabled.
# Run ACO before the enumerator
# VALUES:
# YES
# NO
ACO_BEFORE_ENUM YES
# Run ACO after the enumerator
ACO_AFTER_ENUM NO

# A time limit for the whole region (basic block) in milliseconds. Defaults to no limit.
# Interpretation depends on the TIMEOUT_PER setting.
# Not used when the two pass scheduling approach is enabled.
REGION_TIMEOUT 5

# A time limit for each schedule length in milliseconds. Defaults to no limit.
# Interpretation depends on the TIMEOUT_PER setting.
# Not used when the two pass scheduling approach is enabled.
LENGTH_TIMEOUT 5

# A time limit for the whole region in milliseconds. Defaults to no limit.
# Only used when two pass scheduling is enabled.
# A time limit for the whole region.
FIRST_PASS_REGION_TIMEOUT 5
# A time limit for each schedule length.
FIRST_PASS_LENGTH_TIMEOUT 5

# A time limit for the second pass in milliseconds.
# Only used when two pass scheduling is enabled.
# A time limit for the whole region.
SECOND_PASS_REGION_TIMEOUT 5
# A time limit for each schedule length.
SECOND_PASS_LENGTH_TIMEOUT 5

# How to interpret the timeout value? Valid options:
# INSTR : multiply the time limits in the above fields by the number of
# instructions in the block
# BLOCK : use the time limits in the above fields as is
TIMEOUT_PER INSTR

# The heuristic used for the list scheduler. Valid values are any combination of:
# CP: critical path
# LUC: last use count
# UC: use count
# SC: successor count
# NID: node ID
# LLVM: LLVM's default list scheduler order
# Example: LUC_CP_NID
HEURISTIC LUC_CP_NID

# The heuristic used for the enumerator. If the two pass scheduling
# approach is enabled, then this value will be used for the first pass.
# Same valid values as HEURISTIC.
ENUM_HEURISTIC LUC_CP_NID

# The heuuristic used for the enumerator in the second pass in the two-pass scheduling approach.
# Same valid values as HEURISTIC.
SECOND_PASS_ENUM_HEURISTIC LUC_CP_NID

# The spill cost function to be used. Valid values are:
# PERP: peak excess reg pressure
# PRP: peak reg pressure
# SUM: sum of excess reg pressures across the block
# PEAK_PLUS_AVG: peak excess reg pressure plus the avg reg pressure across the block
# SLIL: sum of live interval lengths for each block
# SPILLS: number of spills after running a register allocator (doesn't work with enumerator)
# TARGET: use target specific register pressure tracking
SPILL_COST_FUNCTION SLIL

# The weight of the spill cost in the objective function. This factor
# defines the importance of spill cost relative to schedule length. A good
# value for this factor should be found experimentally, but is is expected
# to be large on architectures with hardware scheduling like x86 (thus
# making spill cost minimization the primary objective) and smaller on
# architectures with in-order execution like SPARC (thus making scheduling
# the primary objective).
SPILL_COST_WEIGHT 10000

# Precision of latency info:
# PRECISE: use precise latencies from the machine_model.cfg file
# LLVM: use latencies from LLVM
# UNIT: use unit latencies
LATENCY_PRECISION UNIT

# The scheduler used to find an initial feasible schedule.
# LIST: List scheduler
# SEQ: Sequential list scheduler
HEUR_SCHED_TYPE LIST

#use 3-tournament
ACO_TOURNAMENT YES

#use fixd value for bias or not. If not, use ratio instaed
ACO_USE_FIXED_BIAS YES

#Fixed number of evaporation
ACO_FIXED_BIAS 10

# 0 to 1, ratio that will use bias
ACO_BIAS_RATIO 0.9

ACO_LOCAL_DECAY 0.1

ACO_DECAY_FACTOR 0.1

ACO_ANT_PER_ITERATION 10

ACO_TRACE NO

# The importance of the heuristic in ACO. ACO uses (1/heuristic)^importance, so
# importance of 0 means don't use the heuristic.
ACO_HEURISTIC_IMPORTANCE 1

# ACO will stop after this many iterations with no improvement.
ACO_STOP_ITERATIONS 50

# Whether LLVM mutations should be applyed to the DAG.
LLVM_MUTATIONS NO

# (Chris) If using the SLIL cost function, enabling this option
# will force the B&B scheduler to skip DAGs with zero PERP.
FILTER_BY_PERP NO

# If a register type has a MAX pressure below a certain threshold it is ignored.
FILTER_REGISTERS_TYPES_WITH_LOW_PRP NO

# (Chris) This setting chooses which blocks to keep and which blocks to discard.
# The scheduler will fall back to LLVM if the block is discarded.
# Valid options:
#   ALL: always take the block
#   IMPROVED: only take improved blocks
#   OPTIMAL: only take optimal blocks
#   IMPROVED_OR_OPTIMAL: only take improved or optimal blocks
#   ZERO_COST: only take zero-cost blocks
BLOCKS_TO_KEEP ALL

# (Chris) Override USE_OPT_SCHED to apply to specific regions.
# When SCHEDULE_SPECIFIC_REGIONS is set to YES, the scheduler
# will only schedule the regions specified by
# REGIONS_TO_SCHEDULE, which is a comma-separated list of
# scheduling regions.
SCHEDULE_SPECIFIC_REGIONS NO
REGIONS_TO_SCHEDULE fft1D_512:114

# Whether to use suffix concatenation. Disabled automatically if
# history domination is disabled.
ENABLE_SUFFIX_CONCATENATION NO

# Whether to apply the node superiority graph transformation.
STATIC_NODE_SUPERIORITY NO

# Whether to apply node superiority in multiple passes.
MULTI_PASS_NODE_SUPERIORITY NO

# Whether to apply relaxed pruning. Defaults to YES.
APPLY_RELAXED_PRUNING YES

# Whether to apply spill-cost pruning. Defaults to YES.
APPLY_SPILL_COST_PRUNING YES

# Whether to apply history-based domination. Defaults to YES.
APPLY_HISTORY_DOMINATION YES

# Use simple register types. In the machine scheduler this means
# use the first PSet associated with a RegUnit.
USE_SIMPLE_REGISTER_TYPES NO

# Should we simulate register allocation to the evaluate the effect
# of scheduling decisions on estimated spill count.
# BEST: Only simulate RA with the best (lowest cost) schedule.
# LIST: Only simulate RA with the list schedule.
# BOTH: Simulate RA using the best schedule and the list schedule.
# TAKE_SCHED_WITH_LEAST_SPILLS: Simulate RA using the best schedule and the list schedule, and
# take the schedule that generates the least spills.
# NO: Do not simulate register allocation.
SIMULATE_REGISTER_ALLOCATION NO

# Should we ignore ilp and only schedule for register pressure.
SCHEDULE_FOR_RP_ONLY NO

# Whether to enumerate schedules containing stalls (no-op instructions).
# In certain cases, such as having unpipelined instructions, this may
# result in a better schedule. Defaults to YES.
ENUMERATE_STALLS YES

# Whether to generate missing parts of the machine model using information from LLVM.
# Requires a generator class for the target in machine_model.cfg.
GENERATE_MACHINE_MODEL NO

#The algorithm to use for determining the lower bound. Valid values are:
# RJ: Rim and Jain's algorithm.
# LC: Langevin and Cerny's algorithm.
# Defaults to LC.
LB_ALG LC

# Whether to verify that calculated schedules are optimal. Defaults to NO.
VERIFY_SCHEDULE YES

# Whether to apply dynamic node superiority. Defaults to NO.
DYNAMIC_NODE_SUPERIORITY NO

# An option to treat data dependencies of type ORDER as data dependencies.
TREAT_ORDER_DEPS_AS_DATA_DEPS NO

# The number of bits in the hash table used in history-based domination.
HIST_TABLE_HASH_BITS 16
vangthao95 commented 4 years ago

PERP and SLIL have different definitions of optimality so mismatches don't really mean anything between the two cost functions.