Closed LordScarface closed 3 months ago
@tanner-andrulis this looks like an underscore-vs-hyphen issue. Could you advise?
Hi and thank you for the reply,
seems to be the case; I changed data_spaces
to data-spaces
and fixed_structured
to fixed-structured
in the problem and now I get further, but it still crashes:
$ timeloop-mapper arch.yaml problem.yaml
input file: arch.yaml
input file: problem.yaml
_______ __
/_ __(_)___ ___ ___ / /___ ____ ____
/ / / / __ `__ \/ _ \/ / __ \/ __ \/ __ \
/ / / / / / / / / __/ / /_/ / /_/ / /_/ /
/_/ /_/_/ /_/ /_/\___/_/\____/\____/ .___/
/_/
Problem configuration complete.
Architecture configuration complete.
Sparse optimization configuration complete.
Using all available hardware threads = 96
Mapper configuration complete.
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted (core dumped)
Here is the output from gdb if that helps, I don't really know how to dig further:
$ gdb --args timeloop-mapper arch.yaml problem.yaml
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from timeloop-mapper...
(gdb) run
Starting program: /usr/local/bin/timeloop-mapper arch.yaml problem.yaml
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
input file: arch.yaml
input file: problem.yaml
_______ __
/_ __(_)___ ___ ___ / /___ ____ ____
/ / / / __ `__ \/ _ \/ / __ \/ __ \/ __ \
/ / / / / / / / / __/ / /_/ / /_/ / /_/ /
/_/ /_/_/ /_/ /_/\___/_/\____/\____/ .___/
/_/
Problem configuration complete.
Architecture configuration complete.
Sparse optimization configuration complete.
Using all available hardware threads = 96
Mapper configuration complete.
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Program received signal SIGABRT, Aborted.
0x0000000000739fbc in pthread_kill ()
(gdb) bt
#0 0x0000000000739fbc in pthread_kill ()
#1 0x0000000000716fc6 in raise ()
#2 0x0000000000438b30 in abort ()
#3 0x000000000043664a in __gnu_cxx::__verbose_terminate_handler() [clone .cold] ()
#4 0x000000000064ce8c in __cxxabiv1::__terminate(void (*)()) ()
#5 0x000000000064cef7 in std::terminate() ()
#6 0x000000000064d059 in __cxa_throw ()
#7 0x00000000004365ed in operator new(unsigned long) [clone .cold] ()
#8 0x00000000006ca9df in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#9 0x000000000058c7bc in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::assign (__str=..., this=<optimized out>, this=<optimized out>, __str=...) at /usr/include/c++/11/bits/basic_string.h:1387
#10 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator= (__str=..., this=<optimized out>, this=<optimized out>, __str=...) at /usr/include/c++/11/bits/basic_string.h:681
#11 mapping::Constraints::Constraints (this=<optimized out>, arch_props=..., workload=..., this=<optimized out>, arch_props=..., workload=...) at src/mapping/constraints.cpp:63
#12 0x000000000054ee79 in mapspace::Uber::Uber (skip_init=false, filter_spatial_fanout=true, workload=..., arch_specs=..., arch_constraints=..., config=..., this=0x15bd1c0) at src/mapspaces/uber.cpp:53
#13 mapspace::ParseAndConstruct (config=..., arch_constraints=..., arch_specs=..., workload=..., filter_spatial_fanout=<optimized out>) at src/mapspaces/mapspace-factory.cpp:53
#14 0x0000000000480d95 in Application::Application (this=<optimized out>, config=<optimized out>, output_dir=..., name=..., this=<optimized out>, config=<optimized out>, output_dir=..., name=...) at src/applications/mapper/mapper.cpp:257
#15 0x000000000043a49c in main (argc=<optimized out>, argv=<optimized out>) at src/applications/mapper/main.cpp:94
(gdb) up 11
#11 mapping::Constraints::Constraints (this=<optimized out>, arch_props=..., workload=..., this=<optimized out>, arch_props=..., workload=...) at src/mapping/constraints.cpp:63
63 bypass_strings_[problem::Shape::DataSpaceID(pvi)] = xxx;
(gdb) break std::bad_alloc
Function "std::bad_alloc" not defined.
Make breakpoint pending on future shared library load? (y or [n]) n
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/local/bin/timeloop-mapper arch.yaml problem.yaml
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
input file: arch.yaml
input file: problem.yaml
_______ __
/_ __(_)___ ___ ___ / /___ ____ ____
/ / / / __ `__ \/ _ \/ / __ \/ __ \/ __ \
/ / / / / / / / / __/ / /_/ / /_/ / /_/ /
/_/ /_/_/ /_/ /_/\___/_/\____/\____/ .___/
/_/
Problem configuration complete.
Architecture configuration complete.
Sparse optimization configuration complete.
Using all available hardware threads = 96
Mapper configuration complete.
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Program received signal SIGABRT, Aborted.
0x0000000000739fbc in pthread_kill ()
(gdb) exit
Is your Timeloop updated to the latest version?
Thank you guys for the very fast replies,
Yes time loop should be up to data, installed fresh yesterday:
accelergy-timeloop-infrastructure/src/timeloop$ git status
HEAD detached at v3.0.3
Interestingly now I get the following in the version in docker:
timeloop-mapper arch.yaml problem.yaml
input file: arch.yaml
input file: problem.yaml
_______ __
/_ __(_)___ ___ ___ / /___ ____ ____
/ / / / __ `__ \/ _ \/ / __ \/ __ \/ __ \
/ / / / / / / / / __/ / /_/ / /_/ / /_/ /
/_/ /_/_/ /_/ /_/\___/_/\____/\____/ .___/
/_/
Problem configuration complete.
ERROR: key not found: arithmetic, at line: 0
I would recommend ensuring that your install completed successfully. Can you try deleting the Timeloop executables and repository from your system and doing a fully fresh install? Outside of comments, the string "data-spaces" does not exist in the current version of the Timeloop code, so I'd be very surprised if it were looking for a "data-spaces" string.
I have re-installed everything according to: https://timeloop.csail.mit.edu/v4/installation
Though I had to change the Makefile a little bit:
for barvinok: ./configure --enable-shared-barvinok
to ./configure --enable-shared-barvinok --prefix=/usr/local --with-gmp-prefix=/usr/local --with-ntl-prefix=/usr/local
and
cp src/timeloop/build/timeloop-mapper ~/.local/bin/timeloop-mapper
cp src/timeloop/build/timeloop-metrics ~/.local/bin/timeloop-metrics
cp src/timeloop/build/timeloop-model ~/.local/bin/timeloop-model
to
sudo cp src/timeloop/build/timeloop-mapper /usr/local/bin/timeloop-mapper
sudo cp src/timeloop/build/timeloop-metrics /usr/local/bin/timeloop-metrics
sudo cp src/timeloop/build/timeloop-model /usr/local/bin/timeloop-model
After these changes the installation goes through, here is the install log.
But I still get the following output:
$ timeloop-mapper arch.yaml problem.yaml
input file: arch.yaml
input file: problem.yaml
_______ __
/_ __(_)___ ___ ___ / /___ ____ ____
/ / / / __ `__ \/ _ \/ / __ \/ __ \/ __ \
/ / / / / / / / / __/ / /_/ / /_/ / /_/ /
/_/ /_/_/ /_/ /_/\___/_/\____/\____/ .___/
/_/
Problem configuration complete.
Architecture configuration complete.
Sparse optimization configuration complete.
Using all available hardware threads = 96
WARNING: no optimization metric(s) specified, using edp as default.
Mapper configuration complete.
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted (core dumped)
And the output in the official docker container:
$ timeloop-mapper arch.yaml problem.yaml
input file: arch.yaml
input file: problem.yaml
_______ __
/_ __(_)___ ___ ___ / /___ ____ ____
/ / / / __ `__ \/ _ \/ / __ \/ __ \/ __ \
/ / / / / / / / / __/ / /_/ / /_/ / /_/ /
/_/ /_/_/ /_/ /_/\___/_/\____/\____/ .___/
/_/
ERROR: key not found: data-spaces, at line: 0
Thank you, I'm working to recreate on my machine. May I ask what Docker container you're running?
Of course, thank you, I am running this one: https://hub.docker.com/r/timeloopaccelergy/accelergy-timeloop-infrastructure
amd64 or arm64?
amd64, here is the system information if that is relevant:
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 96
On-line CPU(s) list: 0-95
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz
Stepping: 4
CPU MHz: 3309.589
CPU max MHz: 3700.0000
CPU min MHz: 1200.0000
BogoMIPS: 5400.00
Virtualization: VT-x
L1d cache: 1.5 MiB
L1i cache: 1.5 MiB
L2 cache: 48 MiB
L3 cache: 66 MiB
NUMA node0 CPU(s): 0-23,48-71
NUMA node1 CPU(s): 24-47,72-95
Vulnerability Itlb multihit: KVM: Mitigation: Split huge pages
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3
dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx
rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_wind
ow hwp_epp hwp_pkg_req pku ospke md_clear flush_l1d
RE your native-installed version (NOT docker version): The input files are a version 0.4 specification, which is only supported with the new front-end interface. To run these, first you'll need to add the following to your arch.yaml as a top key:
variables:
# Tell Accelergy to use dummy estimations. THIS MUST BE CHANGED LATER to
# something like "45nm". OK for now because we just want to get this spec
# working. Components files are in the exercises repo
technology: -1
# 1GHz
global_cycle_seconds: 1e-9
Also drop the "45nm" from the architecture specification.
Now there are two options for running. You can either run it through Python (recommended), as is done here https://github.com/Accelergy-Project/timeloop-accelergy-exercises/tree/master/workspace/baseline_designs
I also have an in-progress command line interface for the front end. This new CLI is not yet stable, however, so it's still recommended to use Python, but if the following commands work then it'll be okay to use them to get up and running quickly.
git clone https://github.com/Accelergy-Project/timeloopfe.git
cd timeloopfe
pip3 install .
cd ..
tl mapper arch.yaml problem.yaml
For the docker: Can you try the following and see if a new Docker image gets downloaded. I've pulled a fresh one and it seems to be working OK
git clone https://github.com/Accelergy-Project/accelergy-timeloop-infrastructure.git
cd accelergy-timeloop-infrastructure
export DOCKER_ARCH=amd64
docker-compose pull
Note that the things in the previous message will still need to be done to run these files in the Docker
Okay nice, some progress :D
I also had to add the version: 0.4
key to the variables, arch.yaml now looks like this:
I am using the tomeloopfe interface:
import timeloopfe.v4 as tl
from joblib import Parallel, delayed
# Basic setup. Gathers input files, checks for errors
spec = tl.Specification.from_yaml_files(
"arch.yaml", "problem.yaml"
)
# Call Timeloop mapper
tl.call_mapper(spec, output_dir="./output")
# Call Accelergy verbose
tl.call_accelergy_verbose(spec, output_dir="./output")
# Multiprocessed design space exploration
def run_mapper_with_spec(buf_size: int):
spec = tl.Specification.from_yaml_files(
"arch.yaml", "problem.yaml"
)
spec.architecture.find("SMEM").attributes.depth = buf_size
return tl.call_mapper(spec, output_dir=f"outputs_bufsize={buf_size}")
buf_sizes = [1024] # , 2048, 4096, 8192, 16384]
results = Parallel(n_jobs=8)(
delayed(run_mapper_with_spec)(buf_size) for buf_size in buf_sizes
)
Now I get the following output, which I think looks good:
So I think it is working now, thank you very much! :)
Though I tried again in the timeloopaccelergy/accelergy-timeloop-infrastructure:latest-amd64 docker, and there I get the following:
But as it is now working for me, I consider this solved, though I would be happy to assist and test anything else it it helps :)
Great! Thanks for working through this with us. I've got sufficient information to fix Accelergy issue from your stack trace, no need from further info. @angshuman-parashar Good to close this issue!
Hi,
I was previously working with Timeloop any had a architecture specification with version 0.3, now I rebuild Timeloop with the latest version, and my architecture does not work anymore (which is fine). But because I was having trouble porting my architecture, I tried to test one from the tutorial and noticed that it is also not working for me, I tried the: https://github.com/Accelergy-Project/timeloop-accelergy-exercises/tree/master/workspace/baseline_designs/example_designs/sparse_tensor_core_like
But I get the following message:
I have also tried running it in the timeloopaccelergy/accelergy-timeloop-infrastructure docker, but I get the same error. Could anyone maybe point me to an example that is currently working?