TPC-Council / HammerDB

HammerDB Database Load Testing and Benchmarking Tool
http://www.hammerdb.com
GNU General Public License v3.0
545 stars 115 forks source link

Error "A valid table name is required for in, out, or format options" when using MSSQL and BCP for 100 warehouses TPCC build #632

Closed arvindshmicrosoft closed 6 months ago

arvindshmicrosoft commented 8 months ago

Guidance Bug reports are for when HammerDB is not behaving as expected. Bug reports should not be submitted for help in understanding database performance related questions. General questions on database performance or HammerDB usability should be submitted under Discussions.

Describe the bug Attempts to build a 100 warehouse TPCC database are repeatedly failing with error A valid table name is required for in, out, or format options. The CLI does do some work, but serveral virtual users report this error and fail. FYI, the same CLI approach worked for generating 1000 warehouse and 5000 warehouse databases from the same client machine.

This is an error from the bcp.exe command. I'm not sure why the HammerDB code did not end up sending it the right parameters to bcp. If I don't use the bcp option within HammerDB, then the build does seem to complete, albeit much more slowly.

To Reproduce Run the following script:

#!/bin/tclsh
# maintainer: Pooja Jain

print("SETTING CONFIGURATION")
dbset('db','mssqls')
dbset('bm','TPC-C')

diset('connection','mssqls_tcp','true')
diset('connection','mssqls_port','1433')
diset('connection','mssqls_azure','true')
diset('connection','mssqls_encrypt_connection','true')
diset('connection','mssqls_trust_server_cert','true')
diset('connection','mssqls_authentication','sql')
diset('connection','mssqls_server','someserver.database.windows.net')
diset('connection','mssqls_uid','SOMELOGIN')
diset('connection','mssqls_pass','SOMEPASSWORD')

vu = 10
warehouse = 100
diset('tpcc','mssqls_count_ware',warehouse)
diset('tpcc','mssqls_num_vu',vu)
diset('tpcc','mssqls_dbase','TPCC100W')
diset('tpcc', 'mssqls_use_bcp', 'false')

print("SCHEMA BUILD STARTED")
buildschema()
print("SCHEMA BUILD COMPLETED")
exit()

using ./hammerdbcli py auto ./scripts/python/mssqls/tprocc/mssqls_tprocc_buildschema.py

Expected behavior A 100 warehouse DB should have been successfully built.

Screenshots N/A - I am using the CLI.

HammerDB Version (please complete the following information):

HammerDB Interface (please complete the following information):

Operating System (please complete the following information):

Database Server (please complete the following information):

Database Client (please complete the following information):

Additional context

sm-shaw commented 8 months ago

I don't have access to Azure and am unable to reproduce this locally so far, so reproducing will need to be the first step. I'm assuming in your example that: diset('tpcc', 'mssqls_use_bcp', 'false') was meant to be: diset('tpcc', 'mssqls_use_bcp', 'true') So as a start, this following script works for me locally (on Windows) using bcp and the Python interface.

print("SETTING CONFIGURATION")
dbset('db','mssqls')
dbset('bm','TPC-C')

diset('connection','mssqls_tcp','true')
diset('connection','mssqls_port','1433')
diset('connection','mssqls_azure','false')
diset('connection','mssqls_encrypt_connection','true')
diset('connection','mssqls_trust_server_cert','true')
diset('connection','mssqls_authentication','windows')
diset('connection','mssqls_server','{(local)\SQLDEVELOP}')
diset('connection','mssqls_uid','sa')
diset('connection','mssqls_pass','admin')

vu = 5
warehouse = 10
diset('tpcc','mssqls_count_ware',warehouse)
diset('tpcc','mssqls_num_vu',vu)
diset('tpcc','mssqls_dbase','tpcc')
diset('tpcc','mssqls_use_bcp','true')

print("SCHEMA BUILD STARTED")
buildschema()
print("SCHEMA BUILD COMPLETED")
exit()

I run this as follows:

cd C:\Program Files\HammerDB-4.9
.\hammerdbcli.bat py auto ./scripts/python/mssqls/tprocc/mssqls_tprocc_buildschema2.py

Note that if some of the build works it is likely to be the item table that does not use bcp in any case as this is only 100K rows.

One difference is that my server name is wrapped with curly brackets {} so that special characters are not interpreted:

diset('connection','mssqls_server','{(local)\SQLDEVELOP}')

I have not tried connecting to SQL Server with the notation you are using:

someserver.database.windows.net

So it is possible that there is something here as the server name gets passed to the bcp procedure. Therefore one thing to try is to wrap any fields with potential special characters with curly brackets e.g.

diset('connection','mssqls_server','{someserver.database.windows.net}')
diset('connection','mssqls_uid','SOMELOGIN')
diset('connection','mssqls_pass','{SOMEPASSWORD}')

If this doesn't work, I would try running the same using the Tcl script instead of Python to troubleshoot. We are using upvar to access variables from different levels, however this does work in my initial tests. If the Tcl script works and the Python one doesn't then this helps isolate the level the error is occurring at and in this case could be to do with upvarring to the incorrect level. (The variable should simply not be found in this case - bit it is worth eliminating)

If it fails in Tcl also the next step is to customise the script and modify the bcp procedure such as follows, this will then show you the variables it is being called with. This is particularly easy in the GUI, in the CLI you would use savescript to save the build script and then customscript to load it and run it interactively.

proc bcpComm { tableName filePath uid pwd server} {
    upvar 3 authentication authentication
puts "authent:$authentication tabname:$tableName fpath:$filePath server:$server"
    if {[ string toupper $authentication ] eq "WINDOWS" } {
        exec bcp $tableName IN $filePath -b 500000 -a 16000 -T -S $server -c  -t ","
    } else {
        exec bcp $tableName IN $filePath -b 500000 -a 16000 -U $uid -P $pwd -S $server -c  -t ","
    }
}

Finally, to note that we have an open issue for a more elegant solution here: https://github.com/TPC-Council/HammerDB/issues/594 At the moment this is using the external bcp command that works, however we are passing the login parameters to the bcp utility a better solution as above would be to extend the odbc interface so we can login once and call bcp from ODBC. Unfortunately we are not there yet, but if we were doing this, this type of error would be unlikely.

arvindshmicrosoft commented 8 months ago

Thank you @sm-shaw for your quick reply and valuable suggestions. Using the hints you provided, I think I narrowed it down to the fact that the DB name in this case has a - in the name, like tpch1000-01. That's causing the command line execution of bcp to break. In an attempt to fix this locally, I have used the following expression to quote the name so that bcp and SQL does not have any issues with the DB name: set tableName "\[${db}\].dbo.region". Does this sound like the right way to mitigate the issue?

sm-shaw commented 8 months ago

@arvindshmicrosoft, by default HammerDB does not escape special characters at present. Although I am not entirely sure why it works for regular inserts and not for bcp.

With this the curly brackets means a variable is not interpreted so ${db} is the same as $db. One thing that could be an issue is the square brackets are used for a command e.g. puts [ info hostname ] would print the output of the info hostname command. So you are right to escape them here.

set tableName "\[${db}\].dbo.region"

If you wrap any string in curly brackets it won't be interpreted so$db and ${db}would give you the variable however {$db} would give you the string $db. Anything inside double quotes will be interpreted unless escaped with a backslash.

hammerdb>set db test1-test2
test1-test2
hammerdb>set tablename "\[$db\].dbo.region"
[test1-test2].dbo.region
hammerdb>puts $tablename
[test1-test2].dbo.region

Alternatively, to not have any variable substitution for simple cases (where there is no curly brackets in the string) you can use. simple cases use: {} or for more complex more complex cases use: {[ concat {} ]}

Finally there is also a subst command for fine-tuning what is interpreted and what isn't (this is what I would use) so you can say interpret the database variable but not the square brackets as follows

hammerdb>set tablename [subst -nocommands {[$db]}].dbo.region
[test1-test2].dbo.region
hammerdb>puts $tablename
[test1-test2].dbo.region

This is one reason why we have not completely wrapped all special characters with {} as currently there is always a solution to get exactly what you need.

sm-shaw commented 8 months ago

Closed as complete with workaround.

arvindshmicrosoft commented 8 months ago

@sm-shaw - just curious - would you be open to a short-term PR with this workaround, or would you prefer to wait for the holistic changes you mentioned (#594) which will hopefully address this?

sm-shaw commented 8 months ago

I'll reopen the issue for the discussion. So we always appreciate pull requests so yes definitely (multiple) short-term PRs that address specific issues are definitely welcome compared to waiting for a bigger pull request that may take a long time (or even not at all) to action. In this case the issue of special characters in fields is known, but as development time and resources are scare it has never reached the top of the priorities as the workaround(s) are quite straightforward and once people know then easy to implement.

sm-shaw commented 6 months ago

Let's close this for now. We have another bcp related change https://github.com/TPC-Council/HammerDB/pull/653 and it is likely we will continue to make more as needed as versions change. Therefore, extending the interface https://github.com/TPC-Council/HammerDB/issues/594 continues to look like the best way to use the existing connection and the options already connected with rather than using the bcp executable.