Closed saminbassiri closed 1 month ago
Hi @saminbassiri ,
thanks for providing this PR, reading frames with strings now works 😃 !
However, I was following up on a small example use-case using SQL on dataframes, and this crashes. I'm not sure, but I guess it's still related to string processing. Problems which are related to the SQL function can IMHO be addressed in a separate issue/PR.
data.csv and the corresponding .meta file can be taken from #834
df = readFrame("data.csv");
// works (even with slicing 😄 )
print(df[0:5,:]);
registerView("data", df);
// crashes
print(sql("SELECT data.temp FROM data WHERE data.temp>30.0; "));
// does not compile
#print(sql("SELECT data.city, data.temp FROM data WHERE data.temp>30; "));
#print(sql("SELECT data.city, MIN(data.temp), AVG(data.temp), MAX(data.temp) FROM data GROUP BY data.city ;"));
attached is the complete output of above script (I compiled your PR on my own using the daphne build container) - it starts with printing the first 5 sliced rows…:
Frame(5x2, [city:std::string, temp:double])
Algiers 3.4
St. John's 4.3
Dodoma 26.3
Toliara 17
Yellowknife 4
free(): invalid pointer
bin/daphne(+0x148b862)[0x561ff7b98862]
/lib/x86_64-linux-gnu/libc.so.6(+0x45320)[0x7f513a503320]
/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x11c)[0x7f513a55cb1c]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x1e)[0x7f513a50326e]
/lib/x86_64-linux-gnu/libc.so.6(abort+0xdf)[0x7f513a4e68ff]
/lib/x86_64-linux-gnu/libc.so.6(+0x297b6)[0x7f513a4e77b6]
/lib/x86_64-linux-gnu/libc.so.6(+0xa8fe5)[0x7f513a566fe5]
/lib/x86_64-linux-gnu/libc.so.6(+0xab37c)[0x7f513a56937c]
/lib/x86_64-linux-gnu/libc.so.6(__libc_free+0x7e)[0x7f513a56bd9e]
/home/ubuntu/daphne/bin/../lib/libAllKernels.so(+0x7892e5)[0x7f51355212e5]
/home/ubuntu/daphne/bin/../lib/libAllKernels.so(+0x92439b)[0x7f51356bc39b]
/home/ubuntu/daphne/bin/../lib/libAllKernels.so(_decRef__Structure+0x9e)[0x7f513568664e]
[0x7f513ac312ef]
[0x7f513ac313fd]
[0x7f513ac3141d]
bin/daphne(+0x1e05215)[0x561ff8512215]
bin/daphne(+0x14ad133)[0x561ff7bba133]
bin/daphne(+0x14b25ad)[0x561ff7bbf5ad]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7f513a4e81ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7f513a4e828b]
bin/daphne(+0x148a325)[0x561ff7b97325]
[error]: Got an abort signal from the execution engine. Most likely an exception in a shared library. Check logs!
Execution error: Returning from signal 6
Also note that the last 2 lines do not compile, as daphne is not able to properly infer the string type:
[error]: Lowering pipeline error.{}
PassManager failed module lowering, responsible IR written to module_fail.log.
RewriteToCallKernelOpPass failed with the following message [ no kernel for operation `cast` available for the required input types `(!daphne.Frame<?x[?: !daphne.Unknown], ?>)` and output types `(!daphne.Matrix<?x?x!daphne.String>)` for backend `CPP`, registered kernels for this op:
when the dataframe had only numbers (instead of strings in the 1st column), the SQL function works.
KR, Benjamin
@auge: I can reproduce the error you mention. I think it's not directly related to representing frames with string columns and reading CSV files with string columns into frames. Thus, I would merge this PR as it is. However, rest assured that we will fix that bug, because we definitely want to enable SQL queries on such frames.
This PR closes issue #834 by adding support for reading CSV files containing String values into Frames.
Changes:
General Support for String Values in Frames at Kernel Level:
Support for Reading String Values in Frames at Kernel Level:
Support for Reading String Values from CSV Files into Frames in DSL:
str
in the.meta
file to ensure proper type recognition.Testing: