fmartidu / Issues

0 stars 0 forks source link

list bahar to teach me #10

Open fmartidu opened 7 years ago

fmartidu commented 7 years ago
fmartidu commented 7 years ago
fmartidu commented 7 years ago

create README of documentation of steps needed to accomplish each task, with the needed commands

fmartidu commented 7 years ago

papi, adviser, perf, docsigen,

fmartidu commented 7 years ago

openmp pragma, papi, perf, quick check to vectorized code, quick check makefile

fmartidu commented 7 years ago

Doxygen:

/home/ferran/SrcHiper/CNS-3D/src/par_CPU

generates html and latex files /home/ferran/SrcHiper/CNS-3D/src/par_CPU/doxygen_outputs/html

Go into local repo to boot web: /Users/ferranmarti/Desktop/HPC-Factory/Codes/CNS-3D/Hiper/src/par_CPU/doxygen_outputs/html double click index.html

*** Output directory is created after running doxygen. To modify files: to change info displayed in tab "files", ex: dflux.cpp, change the header of each cpp file in /Users/ferranmarti/Desktop/HPC-Factory/Codes/CNS-3D/Hiper/src/par_CPU

*** To change main page from doxygen browser. change file mainpage.dox in /Users/ferranmarti/Desktop/HPC-Factory/Codes/CNS-3D/Hiper/src/par_CPU

*** After changes: Bahar runs doxygen doxygen_config ,,, instead I use the GUI and I need to input in the GUI the numbers that she inputs in doxygen_config. After running the gui, the outputs are updated and the web is updated

fmartidu commented 7 years ago

VTUNE

ssh -X ferran@zelda.eng.uci.edu (hpc: load intel-parallel module) amplxe-gui now GUI pops up, need to:

- Choose Analysis Tab. Try many, normally Basic hotspots or Adavanced hotspots, HPC Performance Characterization, General expoloration, memory access HPC Performance Charact. example: Select Evaluate max DRAM to get value of slope for roofline *** Select: START --- Bahar run it in zelda and bridges -NOTE: column 0 is for latencies, or others, column 1 shows serial work normally Initialization.


New analysis, click red arrow from toolbar. and do another without closing current

EX: Memory Access

*AWESOME FEATURE: compile with -g and then by clicking a function name e.g. dlux in an analysis, it opens the cpp code and shows each line how much is using. ex: if running memory access analysis, from summary tab I click something eg DRAM Bandwidth Bound, this link opens Bottom-up tab sorted by that metric. Here if I clickn on a function it will open either assembly or cpp code and i can check each line what is using.

fmartidu commented 7 years ago

Adviser

Used for tracking if vectorization is happening or not

ssh -X ferran@zelda.eng.uci.edu advixe-gui

PROCEDURE: click on low efficiency loops and check on recommendations to know why those loops have low vectorization efficiency, and fix them. NOTE: if I dont wanna run this GUI, I compile the code with -vec-report=9 it outputs .optrpt file for each .cpp file, and I can check why vect is failing or how efficient is, but there are no recommendations. This is is good if gui is very slow or if Advisor is not installed.

NOTE: It does need the -g flag, it should not have the -novec flag.

fmartidu commented 7 years ago

Perf

log into zelda cd /home/ferran/SrcHiper/CNS-3D/src/par_CPU command to be run is stored in perf-command file. It is : perf stat -e cache-misses,L1-dcache-loads,L1-dcache-load-misses,L1-dcache-stores,L1-dcache-store-misses,L1-dcache-prefetches,L1-dcache-prefetch-misses,LLC-loads,LLC-load-misses,LLC-stores,LLC-store-misses,dTLB-loads,dTLB-load-misses ./HiPer-cpu 10 4 2 1 1 1

fmartidu commented 7 years ago

OpenMP environment variables

OMP_PROC_BIND takes values of close or spread This is used to do NUMA. If I only have 2 threads, and set up OMP_PROC_BIND=close, the domain will be split in 2 chunks both sitting on socket 1 and touched by threads 0 and 1 (in zelda). If I only have 2 threads, and set up OMP_PROC_BIND=spread, the domain will be split in 2 chunks, each in a different socket since thread 0 and thread 8 are the ones that touch the data.(in zelda, since threads 0-7 are in socket 1 and 8-15 in socket 2)

OMP_PLACES used to specify which specific cores to run in each OMP thread. Takes input of a list of numbers to know the cores IDs, or takes values of =cores or =threads to know if the mapping of OMP threads is to cores (using 1 thread per core), or =threads if using all threads in the cores. e.g. OMP_PLACES=threads OMP_PROC_BIND=close and we only have 2 OMP threads they will be mapped to the 2 hardware threads of one core. e.g. OMP_PLACES=cores OMP_PROC_BIND=close and we only have 2 OMP threads they will be mapped to the 2 closest cores of a socket.

OMP_WAIT_POLICY=passive to dont count inactive threads waiting for work in vtune

fmartidu commented 7 years ago

Makefile

top_srcdir = ../.. include $(top_srcdir)/arch/$(BUILD_ARCH)

CPP_FILES := $(wildcard *.cpp) #create a list of all cpp file in CPP_FILES variable CPP_FILES := $(filter-out dflux2.cpp main2.cpp, $(CPP_FILES)) # remove dflux2 and main2 from the list OBJ_DIR = obj #assign value obj to OBJ_DIR variable $(OBJ_DIR): mkdir -p $(OBJ_DIR) #creates obj directory VEC_OBJ_DIR = vec-obj $(VEC_OBJ_DIR): mkdir -p $(VEC_OBJ_DIR) OBJ_FILES = main.o common.o auxiliary_surf.o dfluxc.o cylinderCond.o update_BC.o flow_res.o types.o readMeshgen.o readMesh.o #list object files shared in vect and novect code OBJ_FILES := $(addprefix obj/,$(notdir $(OBJ_FILES))) #add obj/ to each name, second part could avoid notdir comment VEC_OBJS = normal.o metric.o vect_dflux.o vect_euler.o vect_vns.o vect_step.o vect_euflux.o main.o common.o auxiliary_surf.o dfluxc.o cylinderCond.o update_BC.o flow_res.o types.o readMeshgen.o readMesh.o VEC_OBJS := $(addprefix vec-obj/,$(notdir $(VEC_OBJS))) OBJS = normal.o metric.o euler.o dflux.o euflux.o vns.o step.o viscf.o vgrad.o nsflux.o OBJS := $(addprefix obj/,$(notdir $(OBJS))) OBJS := $(OBJS) $(OBJ_FILES) #complete list non vect code VEC_OBJS := $(VEC_OBJS) #complete list vect code INCLUDES := lib/types.h lib/constants.h

$(OBJ_DIR)/%.o: %.cpp $(INCLUDES)| $(OBJ_DIR) $(CXX) $(CPPFLAGS) $(NOVECFLAGS) -DNUMA -c -o $@ $< # creates object files using flags defined in ARCH $(VEC_OBJ_DIR)/%.o: %.cpp $(INCLUDES) | $(VEC_OBJ_DIR) $(CXX) $(CPPFLAGS) $(NOVECFLAGS) -I$(LIKWID_INCLUDE) -I$(PAPI_INC) -DNUMA -DVECTORIZED -c -o $@ $< HiPer-vec: $(VEC_OBJS) $(CXX) $(CPPFLAGS) $(VECFLAGS) -I$(LIKWID_INCLUDE) -I$(PAPI_INC) -DNUMA -DVECTORIZED -c -o vec-obj/main.o main.cpp $(CXX) $(CPPFLAGS) $(VECFLAGS) -DNUMA -DVECTORIZED -pthread -o $@ $^ -L$(LIKWID_LIB) -I$(LIKWID_INCLUDE) -llikwid -lpapi -I$(LIBGOMP)/libgomp.a

HiPer-cpu: $(OBJS) $(CXX) $(CPPFLAGS) $(NOVECFLAGS) -DNUMA -pthread -o $@ $^ -L$(LIKWID_LIB) -I$(LIKWID_INCLUDE) -llikwid -lpapi -L$(PAPI_LIB) -I$(PAPI_INC) -I$(LIBGOMP)/libgomp.a #create executable from the object files clean: rm -rf $(OBJ_DIR) rm -rf $(VEC_OBJ_DIR)

#to clean: generate different targets for each flag. e.g. Hiper-cpu-numa, Hiper-cpu-nonuma, Hiper-vec-numa, Hiper-vec-nonuma
# lets rename the obj directories 
fmartidu commented 7 years ago

Vectorized code

Its regular code, but massaged using guidance from tools to be able to vectorize. basically getting rid of if statements (changing header of the loops). Also changing data structure, and also replacing some simple if statements by conditional assignments.

Suggestion: based on vectorized 2d code from Bahar, take Ferran 3d non-vector code and apply same modifications to vectorize it, and use advisor to check on vectorization.

fmartidu commented 7 years ago

Papi

Use to count FLOPs in machines that let us. To use is very similar to Likwid: set up library path and flag s in ARCH , use regions in the code.

In hpc : module load papi... compile with papi flags in main.cpp at begining I can change the list of papi variables to track list of papi variables is obtaininng by running papi_avail. Normally interested in PAPI_FP_OPS to track number of FLOPS which automatically tracks addition of SP and DP flops

Start session as root. Need to RECOMPILE AS ROOT. Then run code like always, counting you had the papi flags enabled when compiling.