Sandia-OpenSHMEM / SOS

Sandia OpenSHMEM is an implementation of the OpenSHMEM specification over multiple Networking APIs, including Portals 4, the Open Fabric Interface (OFI), and UCX. Please click on the Wiki tab for help with building and using SOS.
Other
61 stars 53 forks source link

Add test-sos as a submodule #1106

Closed kholland-intel closed 6 months ago

davidozog commented 9 months ago

Let's add --recursive-submodule to the example build scripts.

wrrobin commented 8 months ago

I did not try it out, but do we have to use recurse-submodule now every time? Do we need any update on our README or wiki pages?

davidozog commented 8 months ago

I did not try it out, but do we have to use recurse-submodule now every time? Do we need any update on our README or wiki pages?

Yes, if you want to build SOS successfully you will now need to call git clone --recurse-submodules <SOS> to clone both SOS and the tests-sos submodule. You could alternatively clone normally (i.e. without --recurse-submodules) then run git submodule update --init from inside the repo before running autogen.sh.

I think we should also consider a feature where SOS can be built without having to clone and build the tests, but that is not supported in this PR. Do you think we should do that before merging this change?

I absolutely think we should document everything before merging. Also please note my comment above about testing any PR that changes both SOS and the tests-sos submodule... This procedure could be confusing without documentation.

wrrobin commented 7 months ago

@davidozog - I was trying this branch out and the submodule download always getting stuck for me. Not sure if the issue is related with git config. Did you see this?

$> git clone -b "pr/test-submodule" --recurse-submodules https://github.com/kholland-intel/SOS.git sos-submodule
Cloning into 'sos-submodule'...
remote: Enumerating objects: 17958, done.
remote: Counting objects: 100% (1362/1362), done.
remote: Compressing objects: 100% (499/499), done.
remote: Total 17958 (delta 930), reused 1162 (delta 825), pack-reused 16596
Receiving objects: 100% (17958/17958), 5.33 MiB | 18.31 MiB/s, done.
Resolving deltas: 100% (13463/13463), done.
Updating files: 100% (1806/1806), done.
Submodule 'modules/tests-sos' (git@github.com:openshmem-org/tests-sos.git) registered for path 'modules/tests-sos'
Cloning into '/mnt/scratch/rahmanmd/sos-submodule/modules/tests-sos'...

Quick google search led to https://stackoverflow.com/questions/40841882/automatically-access-git-submodules-via-ssh-or-https/44630028#44630028 which suggests to use relative path for submodules. Not sure if that is the fix.

davidozog commented 7 months ago

$> git clone -b "pr/test-submodule" --recurse-submodules https://github.com/kholland-intel/SOS.git sos-submodule ... Quick google search led to https://stackoverflow.com/questions/40841882/automatically-access-git-submodules-via-ssh-or-https/44630028#44630028 which suggests to use relative path for submodules. Not sure if that is the fix.

Hi @wrrobin - this command worked on my end... I wonder if you have github SSH authentication setup appropriately. Are you able to do a test like this?

dave@wsl in ~ $ ssh -T git@github.com
Enter passphrase for key '/home/dave/.ssh/github':
Hi davidozog! You've successfully authenticated, but GitHub does not provide shell access.

Another option might be to change .gitmodules to track the HTTPS URL instead of the SSH... In fact, that seems much more common on other projects (e.g., OSHMPI, mpich, UCX, raspberry-pi, etc).

I usually see relative paths for projects hosted under the same organization, but there is an exception with PRRTE so I think it should work (across Sandia-OpenSHMEM and openshmem-org). I think I prefer the relative path (if it works) because then the user chooses how to authenticate. I will try this unless you prefer https or ssh?

wrrobin commented 7 months ago

Thanks @davidozog. If relative path works, I think that would be the best choice.

wrrobin commented 7 months ago

The change worked fine.

Do you need to have any changes for the make to work?

davidozog commented 7 months ago

The change worked fine.

Great! Yes, for me too.

Do you need to have any changes for the make to work?

I don't think any changes are needed assuming the tests-sos submodule is initialized. Running make check after cloning with --recurse-submodules works fine on my end. A next step is to enable building SOS without having to clone the tests-sos submodule, but I think that's better handled on a separate PR.

So this is ready for review. I assigned the PR to myself offload it from @kholland-intel, and will wait for approvals from you and @philipmarshall21.

davidozog commented 7 months ago

@wrrobin @philipmarshall21 - It seems like we're spending too much time waiting on the UCX with pmi-mpi test row. I propose we remove it or swap it out with something else until #1048 is resolved.

Is the UCX with pmi-mpi test case important to anybody listening? @jdinan perhaps? It's not needed on our end...

wrrobin commented 7 months ago

@davidozog - The changes resolved the issue of make check.

davidozog commented 6 months ago

Tested with different build directories. Works perfectly.

This reminds me we need CI to cover the case of users building in the SOS root directory - added issue #1111 so we can merge this asap since it's holding up other PRs.