PetaVision / OpenPV

PetaVision is a C++ library for designing and deploying large-scale neurally-inspired computational models.
http://petavision.github.io
Eclipse Public License 1.0
40 stars 13 forks source link

make it compile with MPICH #300

Closed mantier closed 4 years ago

mantier commented 5 years ago

Also make it compile with MPICH. Didn't check whether it still compiles with OPENMPI but we are pretty optimistic about it.

We have the following tests which fail with OPENMPI and MPICH as well on Jetson TX2, Ubuntu 18.04

     14 - CheckpointWeightTest_CheckpointWeightTestShared_1 (Failed)
     80 - BatchMPICheckpointSystemTest_2 (Failed)
     81 - BatchMPICheckpointSystemTest_4 (Failed)
    382 - SegmentTest_2 (Failed)
    383 - SegmentTest_4 (Failed)
mantier commented 4 years ago

Any news here?

garkenyon commented 4 years ago

can you write to me directly at garkenyon@gmail.com?


Garrett T. Kenyon, PhD Computer & Computational Science Division, CCS-3 office: 505-695-4587 P.O. Box 1663, MS-B256 cell: 505-412-0416 Los Alamos National Laboratory curriculum vitae https://goo.gl/hLMhDg Los Alamos, NM 87545 email: garkenyon@gmail.com https://petavision.github.io/

"Near the day of Purification, there will be cobwebs spun back and forth in the sky." -Hopi Prophesy

On Mon, Oct 14, 2019 at 3:19 AM mantier notifications@github.com wrote:

Any news here?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/PetaVision/OpenPV/pull/300?email_source=notifications&email_token=AAGMFQQTRTT2HRO6SQRBL23QOQ2SLA5CNFSM4INWJO42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBD3ZTA#issuecomment-541572300, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGMFQUX3P4XMG6VH7UOTBDQOQ2SLANCNFSM4INWJO4Q .

peteschultz commented 4 years ago

I got errors in SegmentTest_2 and SegmentTest_4, but not in the other tests mentioned above. I traced the problem to the status argument of MPI_Recv() in SegmentBuffer, for which I'll commit a fix.

Are you still getting errors in the other tests?

mantier commented 4 years ago

Hi, we compiled the newest master, this time with openMPI. The following tests failed. However, this is on a Jetson TX2 development board, so some issues might be expected. We do not plan to continue to work with PetaVision on this platform.

The following tests FAILED:
         17 - CheckpointWeightTest_CheckpointWeightTestNonshared_1 (Failed)
         80 - BatchMPICheckpointSystemTest_2 (Failed)
         81 - BatchMPICheckpointSystemTest_4 (Failed)
        148 - GPULCATest_4 (Failed)
        154 - GPUSystemTest_postTest_4 (Failed)
        160 - GPUSystemTest_postTestOneToMany_4 (Failed)
        163 - GPUSystemTest_postTest_linked_4 (Failed)
        166 - GPUSystemTest_HyPerLCAGpuTest_4 (Failed)
        169 - GPUSystemTest_postRecvBatch_4 (Failed)
Errors while running CTest
mantier commented 4 years ago

Update: we just ran the above mentioned failed tests seperately and this time they passed. We suspect that the system ran out of memory when we performed the tests in parallel. So all is good.