SCOREC / fep

Finite Element Programming course materials
6 stars 4 forks source link

assertion: tag not found #30

Closed goinea closed 3 years ago

goinea commented 3 years ago

Summary

I have a signal aborted error when running a program that I am not able to find on the internet for erp01.

Details

This issue has, what appears to be, eight locations

Code

This is the message that I get: image Text: !mds_find_tag(&mesh->tags, name) failed at /tmp/CCNIsmth/spack-stage/spack-stage-pumi-master-mktotssjhxnlmwvuwbaamj6tjyyroydk/spack-src/mds/apfMDS.cc + 421 srun: error: erp01: task 0: Aborted

=== mesh size and tag info ===

global ent: v 49, e 84, f 36, r 0

(p0) # local ent: v 49, e 84, f 36, r 0 (p0) # own ent: v 49, e 84, f 36, r 0

mesh shape: "Linear" tag 0: "coordinates_ver", type 0, size 3 tag 1: "coordinates_edg", type 0, size 3 [erp01:95695] Process received signal [erp01:95695] Signal: Aborted (6) [erp01:95695] Signal code: (-6) [erp01:95695] [ 0] /usr/lib64/libpthread.so.0(+0xf630)[0x7fb4bb734630] [erp01:95695] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x7fb4bb38d3d7] [erp01:95695] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x7fb4bb38eac8] [erp01:95695] [ 3] /gpfs/u/home/FEP5/FEP5gnsd/a2/./build/a2[0x513045] [erp01:95695] [ 4] /gpfs/u/home/FEP5/FEP5gnsd/a2/./build/a2[0x47636e] [erp01:95695] [ 5] /gpfs/u/home/FEP5/FEP5gnsd/a2/./build/a2[0x47b80a] [erp01:95695] [ 6] /gpfs/u/home/FEP5/FEP5gnsd/a2/./build/a2(main+0x573)[0x465dd3] [erp01:95695] [ 7] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fb4bb379555] [erp01:95695] [ 8] /gpfs/u/home/FEP5/FEP5gnsd/a2/./build/a2[0x466d3f] [erp01:95695] End of error message

Have you seen this before? If so where should I look for resolution?

cwsmith commented 3 years ago

~Please do not attempt to run on erpfen01. The build and run scripts were not designed to work there. Follow the instructions for building and running the job submission scripts on erp14.~ Hold on... this is erp01.

~Would you please paste the commands you used to build and run? Please be sure to include which node the commands were executed on.~

I think you have a string mismatch when doing tag operations. Specifically, this is the error message to pay attention to:

 !mds_find_tag(&mesh->tags, name) failed at /tmp/CCNIsmth/spack-stage/spack-stage-pumi-master-mktotssjhxnlmwvuwbaamj6tjyyroydk/spack-src/mds/apfMDS.cc + 421

It can't find the tag you are asking for using the name given.

goinea commented 3 years ago

On 2021-04-09 09:00, Cameron Smith wrote:

Please do not attempt to run on erpfen01. The build and run scripts were not designed to work there.

Follow the instructions for building and running the job submission scripts on erp14.

-- You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub [1], or unsubscribe [2].

Links:

[1] https://github.com/SCOREC/fep/issues/30#issuecomment-816664367 [2] https://github.com/notifications/unsubscribe-auth/ASZY67TJH4GBDJ35GQVBYPLTH326VANCNFSM42UH5OKQ Cameron, Okay Best, -- Adam Goines Rensselaer Polytechnic Institute

cwsmith commented 3 years ago

@goinea I misread the info you posted and gave an incorrect response. Please look on github at the edits; those comments should be more helpful.

goinea commented 3 years ago

On 2021-04-09 09:00, Cameron Smith wrote:

Please do not attempt to run on erpfen01. The build and run scripts were not designed to work there.

Follow the instructions for building and running the job submission scripts on erp14.

-- You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub [1], or unsubscribe [2].

Links:

[1] https://github.com/SCOREC/fep/issues/30#issuecomment-816664367 [2] https://github.com/notifications/unsubscribe-auth/ASZY67TJH4GBDJ35GQVBYPLTH326VANCNFSM42UH5OKQ Cameron, I have ssh'ed into erp14 but my error still shows for erp 9. Photos are attached Error: !mds_find_tag(&mesh->tags, name) failed at /tmp/CCNIsmth/spack-stage/spack-stage-pumi-master-mktotssjhxnlmwvuwbaamj6tjyyroydk/spack-src/mds/apfMDS.cc

  • 421 srun: error: erp09: task 0: Aborted

=== mesh size and tag info ===

global ent: v 49, e 84, f 36, r 0

(p0) # local ent: v 49, e 84, f 36, r 0 (p0) # own ent: v 49, e 84, f 36, r 0

mesh shape: "Linear" tag 0: "coordinates_ver", type 0, size 3 tag 1: "coordinates_edg", type 0, size 3 [erp09:113410] Process received signal [erp09:113410] Signal: Aborted (6) [erp09:113410] Signal code: (-6) [erp09:113410] [ 0] /usr/lib64/libpthread.so.0(+0xf630)[0x7fc34eef0630] [erp09:113410] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x7fc34eb493d7] [erp09:113410] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x7fc34eb4aac8] [erp09:113410] [ 3] /gpfs/u/home/FEP5/FEP5gnsd/a2/./build/a2[0x513045] [erp09:113410] [ 4] /gpfs/u/home/FEP5/FEP5gnsd/a2/./build/a2[0x47636e] [erp09:113410] [ 5] /gpfs/u/home/FEP5/FEP5gnsd/a2/./build/a2[0x47b80a] [erp09:113410] [ 6] /gpfs/u/home/FEP5/FEP5gnsd/a2/./build/a2(main+0x573)[0x465dd3] [erp09:113410] [ 7] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fc34eb35555] [erp09:113410] [ 8] /gpfs/u/home/FEP5/FEP5gnsd/a2/./build/a2[0x466d3f] [erp09:113410] End of error message ~
~

                  ~                                                   
                                                                 ~    

                                       "slurm-17974.out" 24L, 1265C

Best, -- Adam Goines Rensselaer Polytechnic Institute

goinea commented 3 years ago

I see the tag error and will address it but do you understand why that would still output an error on erp 1 and erp9 instead of erp14?

cwsmith commented 3 years ago

The tag error will happen on any of the compute nodes. It appears to be a bug in your code.

goinea commented 3 years ago

Okay, thanks.