VI4IO / io500-app

Development version of the new IO-500 Application
MIT License
18 stars 11 forks source link

New Arbitrary API Options #18

Open markhpc opened 4 years ago

markhpc commented 4 years ago

Hi, I'm trying to use the new arbitrary API options in the C version of the io500 benchmark. For the scr version I can properly run all benchmarks with the cephfs aiori backend by filling the API field like so:

API = CEPHFS --cephfs.user admin --cephfs.conf /etc/ceph/ceph.conf --cephfs.prefix /tmp/cbt/mnt/cbt-cephfs-kernel/0

The new API options don't parse that correctly, trying to read "admin" as an option:

FATAL (src/util.c:153) Provided API option admin appears to be no API supported version

On the mailing list Julian suggested trying to change the API field like so:

API = CEPHFS --cephfs.user=admin --cephfs.conf=/etc/ceph/ceph.conf --cephfs.prefix=/tmp/cbt/mnt/cbt-cephfs-kernel/0

That lead to some other errors with the C application invocation of ior:

[RESULT-invalid]       ior-easy-write       18.069242 GiB/s  : time 3.353 seconds
[RESULT-invalid]    mdtest-easy-write       12.979939 kIOPS : time 1.442 seconds
ior ERROR: ceph_open failed, errno 107, Transport endpoint is not connected (aiori-CEPHFS.c:213)
ior ERROR: ceph_open failed, errno 107, Transport endpoint is not connected (aiori-CEPHFS.c:213)

But also appeared to break the scr version of the test where everything after "--cephfs.user" (ie the first equals) is stripped out:

[Exec] mpirun -npernode 30 --hostfile /home/nhm/io500/hosts /tmp/cbt/mnt/cbt-cephfs-kernel/0/io500/io500-app/bin/ior -w -a CEPHFS --cephfs.user -t 2m -b 9920000m -F -i 1 -C -Q 1 -g -G 27 -k -e -o /tmp/cbt/mnt/cbt-cephfs-kernel/0/io500/io500-app/datafiles/2020.05.28-16.22.56-scr/ior_easy/ior_file_easy -O stoneWallingStatusFile=/tmp/cbt/mnt/cbt-cephfs-kernel/0/io500/io500-app/datafiles/2020.05.28-16.22.56-scr/ior_easy/stonewall -O stoneWallingWearOut=1 -D 1

markhpc commented 4 years ago

Also, if I try to specify API in the global section of the ini file rather than repeating it for each ior- and mdtest- section, the io500 application will think that the entire string is the AIORI backend:

FATAL (src/main.c:28) Could not load AIORI backend for CEPHFS --cephfs.user=admin --cephfs.conf=/etc/ceph/ceph.conf --cephfs.prefix=/tmp/cbt/mnt/cbt-cephfs-kernel/0

mchaarawi commented 4 years ago

we are having the same issue with the DAOS/DFS backend. Only the script version is working and the C-app version is not with the same issues that Mark has posted above. I tried different branches (master and the isc one), and have not been able to resolve.

JulianKunkel commented 4 years ago

Mark, can you try again, the latter should now be fixed. The former appears to be related to the way options are handled with the module and needs to be checked further. Maybe that situation is now remedied?

JulianKunkel commented 4 years ago

Mohamad, can you check as well? Want to see if that is an issue how the interface is used or Ceph specific.

If there is a problem, please also check the results/*.txt files to see what parameters have been used.

mchaarawi commented 4 years ago

It still doesn't seem to work for me. by the latter, you mean passing options with '=' instead of ' ', right?

so here is what i pass for example for ior-easy in the ini file: API = DFS --dfs.pool=322efea8-f41a-4ede-940c-87b7ea5fb64d --dfs.cont=28779a43-9485-497e-afb3-587e5b45a0ad --dfs.svcl=1 --dfs.prefix=/tmp/dfuse

but when it's run with the script i get:

[Starting] ior_easy_write
[Exec] mpirun -np 8 /home/mschaara/install/ior/bin/ior -w  -a DFS --dfs.pool -t 1m -b 10m -F -i 1 -C -Q 1 -g -G 27 -k -e -o /tmp/dfuse//datafiles/2020.05.28-17.27.12-scr/ior_easy/ior_file_easy -O stoneWallingStatusFile=/tmp/dfuse//datafiles/2020.05.28-17.27.12-scr/ior_easy/stonewall -O stoneWallingWearOut=1 -D 3 

and with the C app, i don't see anything in the result file for that, but i can see from the stderr that something is wrong there too:

Invalid DAOS pool/cont
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0

just FYI, as Mark was saying, using the original way with adding space instead of = for the driver specific options works well for the script, but the app fails in the same way.

JulianKunkel commented 4 years ago

Yes, that is correct. Fair enough, we can fix the script to deal with "=" to resolve this issue. The core question is what fails inside the multiple calls of IOR/MDtest due to being non-POSIX.

As I have no non-POSIX file system available, it would be awesome if you could support the debugging a little bit, it should be a small issue to fix. Why would DAOS/Ceph report that it cannot connect? Are the passed IOR arguments correct, e.g., in the results/ior_easy_write.txt file?

You can run the app directly, e.g., ./io500 Setting in the ini file: [debug] stonewall-time = 1 # for testing

I just pushed a patch to IOR which likely won't fix the issue but may resolve some general issues with non-POSIX fs.

On Thu, May 28, 2020 at 6:32 PM Mohamad Chaarawi notifications@github.com wrote:

It still doesn't seem to work for me. by the latter, you mean passing options with '=' instead of ' ', right?

so here is what i pass for example for ior-easy in the ini file: API = DFS --dfs.pool=322efea8-f41a-4ede-940c-87b7ea5fb64d --dfs.cont=28779a43-9485-497e-afb3-587e5b45a0ad --dfs.svcl=1 --dfs.prefix=/tmp/dfuse

but when it's run with the script i get:

[Starting] ior_easy_write [Exec] mpirun -np 8 /home/mschaara/install/ior/bin/ior -w -a DFS --dfs.pool -t 1m -b 10m -F -i 1 -C -Q 1 -g -G 27 -k -e -o /tmp/dfuse//datafiles/2020.05.28-17.27.12-scr/ior_easy/ior_file_easy -O stoneWallingStatusFile=/tmp/dfuse//datafiles/2020.05.28-17.27.12-scr/ior_easy/stonewall -O stoneWallingWearOut=1 -D 3

and with the C app, i don't see anything in the result file for that, but i can see from the stderr that something is wrong there too:

Invalid DAOS pool/cont application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0

just FYI, as Mark was saying, using the original way with adding space instead of = for the driver specific options works well for the script, but the app fails in the same way.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/VI4IO/io500-app/issues/18#issuecomment-635490873, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGW5SU4ZTM377WJ4WK7XE3RT2N3FANCNFSM4NNHTMEA .

-- Dr. Julian Kunkel Lecturer, Department of Computer Science +44 (0) 118 378 8218 http://www.cs.reading.ac.uk/ https://hps.vi4io.org/ PGP Fingerprint: 1468 1A86 A908 D77E B40F 45D6 2B15 73A5 9D39 A28E

JulianKunkel commented 4 years ago

If you still have an issue, please join on our Slack [1] the channel io500-isc-issue. [1] https://join.slack.com/t/vi4io/shared_invite/enQtMjMyOTgxMDg0OTQ1LTcyYWJkYzJiMDUzMDU2YjE1NjFjMGNjZWEwYTM2NzQxNzcxMDExYmFmMjJjMDY3NjBiYTRjYTM1M2I3ZGE3NmM We can then resolve the matter efficiently.

On Thu, May 28, 2020 at 7:10 PM Julian Kunkel juliankunkel@googlemail.com wrote:

Yes, that is correct. Fair enough, we can fix the script to deal with "=" to resolve this issue. The core question is what fails inside the multiple calls of IOR/MDtest due to being non-POSIX.

As I have no non-POSIX file system available, it would be awesome if you could support the debugging a little bit, it should be a small issue to fix. Why would DAOS/Ceph report that it cannot connect? Are the passed IOR arguments correct, e.g., in the results/ior_easy_write.txt file?

You can run the app directly, e.g., ./io500 Setting in the ini file: [debug] stonewall-time = 1 # for testing

I just pushed a patch to IOR which likely won't fix the issue but may resolve some general issues with non-POSIX fs.

On Thu, May 28, 2020 at 6:32 PM Mohamad Chaarawi notifications@github.com wrote:

It still doesn't seem to work for me. by the latter, you mean passing options with '=' instead of ' ', right?

so here is what i pass for example for ior-easy in the ini file: API = DFS --dfs.pool=322efea8-f41a-4ede-940c-87b7ea5fb64d --dfs.cont=28779a43-9485-497e-afb3-587e5b45a0ad --dfs.svcl=1 --dfs.prefix=/tmp/dfuse

but when it's run with the script i get:

[Starting] ior_easy_write [Exec] mpirun -np 8 /home/mschaara/install/ior/bin/ior -w -a DFS --dfs.pool -t 1m -b 10m -F -i 1 -C -Q 1 -g -G 27 -k -e -o /tmp/dfuse//datafiles/2020.05.28-17.27.12-scr/ior_easy/ior_file_easy -O stoneWallingStatusFile=/tmp/dfuse//datafiles/2020.05.28-17.27.12-scr/ior_easy/stonewall -O stoneWallingWearOut=1 -D 3

and with the C app, i don't see anything in the result file for that, but i can see from the stderr that something is wrong there too:

Invalid DAOS pool/cont application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0

just FYI, as Mark was saying, using the original way with adding space instead of = for the driver specific options works well for the script, but the app fails in the same way.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/VI4IO/io500-app/issues/18#issuecomment-635490873, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGW5SU4ZTM377WJ4WK7XE3RT2N3FANCNFSM4NNHTMEA .

-- Dr. Julian Kunkel Lecturer, Department of Computer Science +44 (0) 118 378 8218 http://www.cs.reading.ac.uk/ https://hps.vi4io.org/ PGP Fingerprint: 1468 1A86 A908 D77E B40F 45D6 2B15 73A5 9D39 A28E

-- Dr. Julian Kunkel Lecturer, Department of Computer Science +44 (0) 118 378 8218 http://www.cs.reading.ac.uk/ https://hps.vi4io.org/ PGP Fingerprint: 1468 1A86 A908 D77E B40F 45D6 2B15 73A5 9D39 A28E

mchaarawi commented 4 years ago

yes im starting to debug this. what was confusing to me is that this error message: Invalid DAOS pool/cont Comes from the DAOS backend, but im specifying the DFS backend in the ini file as above.

It seems like the io500 app calls initialize on all compiled in drivers. making the init non-fatal seems to get me past this issue, but it seems at that point anyway, we have not parsed the input args, and the DFS API init also fails the same way and init on the DFS driver is never called again.

I continue digging.

mchaarawi commented 4 years ago

If you still have an issue, please join on our Slack [1] the channel io500-isc-issue. [1] https://join.slack.com/t/vi4io/shared_invite/enQtMjMyOTgxMDg0OTQ1LTcyYWJkYzJiMDUzMDU2YjE1NjFjMGNjZWEwYTM2NzQxNzcxMDExYmFmMjJjMDY3NjBiYTRjYTM1M2I3ZGE3NmM We can then resolve the matter efficiently. On Thu, May 28, 2020 at 7:10 PM Julian Kunkel juliankunkel@googlemail.com wrote: Yes, that is correct. Fair enough, we can fix the script to deal with "=" to resolve this issue. The core question is what fails inside the multiple calls of IOR/MDtest due to being non-POSIX. As I have no non-POSIX file system available, it would be awesome if you could support the debugging a little bit, it should be a small issue to fix. Why would DAOS/Ceph report that it cannot connect? Are the passed IOR arguments correct, e.g., in the results/ior_easy_write.txt file? You can run the app directly, e.g., ./io500 Setting in the ini file: [debug] stonewall-time = 1 # for testing I just pushed a patch to IOR which likely won't fix the issue but may resolve some general issues with non-POSIX fs. On Thu, May 28, 2020 at 6:32 PM Mohamad Chaarawi @.***> wrote: > It still doesn't seem to work for me. > by the latter, you mean passing options with '=' instead of ' ', right? > > so here is what i pass for example for ior-easy in the ini file: > API = DFS --dfs.pool=322efea8-f41a-4ede-940c-87b7ea5fb64d > --dfs.cont=28779a43-9485-497e-afb3-587e5b45a0ad --dfs.svcl=1 > --dfs.prefix=/tmp/dfuse > > but when it's run with the script i get: > > [Starting] ior_easy_write > [Exec] mpirun -np 8 /home/mschaara/install/ior/bin/ior -w -a DFS --dfs.pool -t 1m -b 10m -F -i 1 -C -Q 1 -g -G 27 -k -e -o /tmp/dfuse//datafiles/2020.05.28-17.27.12-scr/ior_easy/ior_file_easy -O stoneWallingStatusFile=/tmp/dfuse//datafiles/2020.05.28-17.27.12-scr/ior_easy/stonewall -O stoneWallingWearOut=1 -D 3 > > and with the C app, i don't see anything in the result file for that, but > i can see from the stderr that something is wrong there too: > > Invalid DAOS pool/cont > application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0 > > > just FYI, as Mark was saying, using the original way with adding space > instead of = for the driver specific options works well for the script, but > the app fails in the same way. > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#18 (comment)>, > or unsubscribe > https://github.com/notifications/unsubscribe-auth/ABGW5SU4ZTM377WJ4WK7XE3RT2N3FANCNFSM4NNHTMEA > . > -- Dr. Julian Kunkel Lecturer, Department of Computer Science +44 (0) 118 378 8218 http://www.cs.reading.ac.uk/ https://hps.vi4io.org/ PGP Fingerprint: 1468 1A86 A908 D77E B40F 45D6 2B15 73A5 9D39 A28E -- Dr. Julian Kunkel Lecturer, Department of Computer Science +44 (0) 118 378 8218 http://www.cs.reading.ac.uk/ https://hps.vi4io.org/ PGP Fingerprint: 1468 1A86 A908 D77E B40F 45D6 2B15 73A5 9D39 A28E

seems the link is not active?

JulianKunkel commented 4 years ago

Seems it was wrongly coded when C&P: https://join.slack.com/t/vi4io/shared_invite/zt-2z12x7o1-xhWH3WAQJRktDwy0hSus~g