Enhancement- Loading csv dataset which does not have primary key

ashishkshukla commented 5 years ago

This utility does not work if dataset does not have key column/ it has composite keys

Let's say if we have some CSV dataset, which we want to load and run some analytical query. If we go with existing feature available , we need to preprocess the data and add an key to every record.

what my suggestion is , instead of preprocessing and modifying the CSV data, we can use the existing dataset and add unique UUID as key in each line while ingesting to aerospike.

I am working on this implementation and raise the pull request for review.

hbpatre commented 5 years ago

Aerospike is key-value database, we need key to fetch record. This is the main functionality of Aerospike. If keys are dynamically generated then how do you want to fetch these records? What is the use-case which uses Aerospike but with random keys?

ashishkshukla commented 5 years ago

I was exploring analytical capabilities of Aerospike. Yes Aerospike is key-value store. But if we see this loader utility and AQL capability provided by Aerospike, then not necessarily we always need a key to fetch the records. We can create indexes on any one the columns and can fetch the record by using that column.

My suggestion was -

Suppose we have a csv file in which each record is uniquely identified by more than one column, then this utility wont work in than case.

So First option is to pre-process the CSV file and add a key entry with every record and then call loader utility on modified CSV file. Second option which I have suggested -we will add the key entry on the run time.

User still can go for first option. Second option automatically wont come in the picture in that case. But I think my suggestion will remove extra work from the user.

hbpatre commented 5 years ago

This is not common use case for us, but you are welcome to use Aerospike for scans/queries with the modified loader. So we will not be able to pull request to master branch.

aerospike / aerospike-loader

Enhancement- Loading csv dataset which does not have primary key #13