Closed syedalihasany closed 9 months ago
Hi, thanks for reaching out to us.
I think you need to use the nested layout, not the contiguous layout, for this data. You can update your code as follows:
// Put the tensor into the database that was loaded from file
client.put_tensor(in_key, input_tensor.data(), dims, SRTensorTypeFloat, SRMemLayoutNested);
// running the model
client.run_model(model_key, {in_key}, {out_key});
// assigning the dimensions of the output tensor to the output_dims vector
std::vector<size_t> output_dims = {1000, 1};
std::vector<float> result(1000, 0);
client.unpack_tensor(out_key, result.data(), output_dims,SRTensorTypeFloat, SRMemLayoutNested);
As an aside, if you can tweak your model to output a single-dimensional tensor with dimensions {6000} -- rather than a two-dimensional tensor with dimensions {6000, 1} -- you can use contiguous with the unpack call and it will be a bit more efficient.
Please let us know if this works out for you! -- Bill
Hi Bill,
I tried this but now I have run into another issue my Orchestrator fails to start
I am using the following python commands to start the Orchestrator:
import smartsim
import smartredis
from smartredis import Client
from smartsim import Experiment
REDIS_PORT=6379
exp = Experiment("moving_tensors", launcher="local")
db = exp.create_database(db_nodes=1,port=REDIS_PORT,interface="lo")
exp.generate(db)
exp.start(db)
I get the following error message:
22:38:58 lipc02 SmartSim[276840] INFO Working in previously created experiment
>>> exp.start(db)
22:39:16 lipc02 SmartSim[276840] ERROR Orchestrator failed during startup See /home/bohan/blackscholes/blackscholes/using_smart_reddis/moving_tensors/database for details
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/bohan/.local/lib/python3.8/site-packages/smartsim/experiment.py", line 192, in start
self._control.start(
File "/home/bohan/.local/lib/python3.8/site-packages/smartsim/_core/control/controller.py", line 90, in start
self._launch(manifest)
File "/home/bohan/.local/lib/python3.8/site-packages/smartsim/_core/control/controller.py", line 303, in _launch
self._launch_orchestrator(orchestrator)
File "/home/bohan/.local/lib/python3.8/site-packages/smartsim/_core/control/controller.py", line 362, in _launch_orchestrator
self._orchestrator_launch_wait(orchestrator)
File "/home/bohan/.local/lib/python3.8/site-packages/smartsim/_core/control/controller.py", line 555, in _orchestrator_launch_wait
raise SmartSimError(msg)
smartsim.error.errors.SmartSimError: Orchestrator failed during startup See /home/bohan/blackscholes/blackscholes/using_smart_reddis/moving_tensors/database for details
when I run the CPP code (which will move input tensors to the pytorch model and get the output tensors) I get the following error message:
Segmentation fault (core dumped)
What I am doing is that I start the Orchestrator from python to handle the in-memory database then I execute a CPP code which puts input tensors in memory and runs a jit traced torch model to get the output tensors which I write to a file. Am I doing something wrong? The CPP code for that is as follows:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>
#include <fstream>
#include <iostream>
#include <vector>
#include <iomanip>
#include <chrono>
#include <sstream>
#include <stdlib.h>
// including the redis client header
#include "client.h"
int main(int argc, char* argv[]) {
std::cout<<"start"<<std::endl;
// Set environment variables
setenv("SR_LOG_FILE", "smartredis.log", 1);
setenv("SR_LOG_LEVEL", "INFO", 1);
setenv("SSDB", "localhost:6379", 1); // Adjust the address and port as needed
// Initialize a vector that will hold the input tensor
size_t n_rows = 1000;
size_t n_cols = 6;
size_t n_values = n_rows * n_cols;
std::vector<float> input_tensor(n_values, 0);
std::vector<size_t> dims = {1000, 6};
// Read values from the tab separated input feature file
std::string input_file = "../input_features.txt";
std::ifstream file(input_file);
std::cout<<"after inputs"<<std::endl;
if (!file.is_open()) {
std::cerr << "Error opening file: " << input_file << std::endl;
return 1;
}
for (size_t row = 0; row < n_rows; row++) {
for (size_t col = 0; col < n_cols; col++) {
float value;
if (col < n_cols - 1) {
file >> value;
file.ignore(1); // Skip the tab character
} else {
file >> value;
}
input_tensor[row * n_cols + col] = value; // makes the 100k by 6 into a linear tensor
}
}
file.close();
// Initialize a SmartRedis client
bool cluster_mode = false; // Set to false if not using a clustered database
SmartRedis::Client client(cluster_mode, __FILE__);
std::cout<<"set client"<<std::endl;
// Use the client to set a model in the database from a file
std::string model_key = "ali_model";
std::string model_file = "../ali_model_scripted.pt";
std::cout<<"USING CPU"<<std::endl;
client.set_model_from_file(model_key, model_file, "TORCH", "CPU",1000); // the last parameter is the batch size should we pass this as 100k
std::cout<<"set model"<<std::endl;
// Declare keys that we will use in forthcoming client commands
std::string in_key = "input_key";
std::string out_key = "output_key";
// Put the tensor into the database that was loaded from file
client.put_tensor(in_key, input_tensor.data(), dims, SRTensorTypeFloat, SRMemLayoutNested);
// running the model
client.run_model(model_key, {in_key}, {out_key});
// assigning the dimensions of the output tensor to the output_dims vector
std::vector<size_t> output_dims = {1000, 1};
std::vector<float> result(1000, 0);
client.unpack_tensor(out_key, result.data(), output_dims,SRTensorTypeFloat, SRMemLayoutNested);
// Create an output file stream
std::ofstream outputFile("./ali_model_results_using_Cpp_and_Redis.txt");
if (outputFile.is_open()) {
for (size_t i = 0; i < result.size(); i++) {
outputFile << result[i] << std::endl;
}
outputFile.close();
} else {
std::cerr << "Error: Unable to open the output file." << std::endl;
}
return 0;
}
Hi, with respect to the Orchestrator failed during startup
message, this is likely because you have an existing Orchestrator (Redis database) that is already running and using the port you've requested. Please make sure you have an experiment.stop()
line in your python script to make sure to shut down the existing database. You can also manually kill it via the unix kill
command if you can find the PIDs for it (grep for "redis") or by issuing the following command from the SmartRedis root (log into the node with the Redis database in it):
$ third-party/redis/src/redis-cli -p 6379 shutdown
As for the segfault, first off, I steered you wrong when I said to mark the call to put_tensor() as a nested layout. Now that I see how you've set up the memory in a single array, contiguous is the way you need to go. In a nested layout, you would have an array of pointers to arrays containnig rows of data, The client, when it attempted to dereference those pointers, found that they weren't really pointers and that's what led to the segmentation fault.
Rather than trying to use the C++ Vector STL class for your data, you might be better off using a plain multi-dimensional array. You can see a good example of how to initialize one in the SmartRedis tests: Please refer to tests/cpp/client_test_put_get_2D.cpp
for the code. If you do switch to a multi-dimensional array of this form, you will need to mark your memory layout as nested both for the put_tensor() and unpack_tensor() calls.
Please let me know how it goes! -- Bill
Hi, I wanted to follow up to see if you are up and running now?
Since we haven't heard back from you, I'm going to assume that you are up and running and that all is going well now. If this isn't the case or if you have further difficulties, please don't hesitate to reach out to us again!
Description
I am running a C++ program that sends input tensors of size 1000 by 6 to a pytorch model using smartsim and retrieves output tensors of size 1000 by 1. I initialize the smartsim/smartredis orchestrator using the python interpreter and then execute the C++ binary file and I get the following error message:
How to reproduce
My C++ code snippet initializing the client (and putting tensors) is as follows and I think the issue could be resolved by changing
SRMemLayoutContiguous
parameter in theclient.put_tensor
andclient.unpack_tensor
methods. I just don't know how like should I exclude this parameter entirely or change it to something else?:Expected behavior
The code should execute without any errors.
System