gexijin / RTutor

Chat with your data via AI. https://RTutor.ai
https://RTutor.ai
Other
306 stars 51 forks source link

File parsing issues - can't load non-numeric data. #89

Open jamesnemesh opened 8 months ago

jamesnemesh commented 8 months ago

Hi, this is a really interesting tool. Unfortunately, I'm running into some issues using it for "real" data.

I've tried to upload a few files (that are generated from R data.frames via write.table, tab separated), and only the numeric columns are included. Factors are data too!

Here some sample data: test.txt

When parsed the strings/factors are removed:

image

Additionally, the column names are mutated (and forced to lower case). The original column names:

colnames (df) [1] "DONOR" "REP_IRVs" "NUM_SNPS" "CENSUS"

dan-burk commented 4 months ago

Hi,

Thanks for the feedback! For certain reasons we remove the first column in a dataset if it looks like an ID field. In the example of test.txt the DONOR column is a unique identifier, hence the reason RTutor is throwing it out. If you change one of the DONOR ID's to be the same as another ID, RTutor will recognize that column as a character variable and won't throw it out (see test2.txt). test2.txt

If you have an example where RTutor isn't uploading factor/character columns that aren't unique identifiers, we would appreciate the feedback greatly.

As for the column names. By default we clean the column names of files uploaded by users. Would it be advantageous for the ability to opt out of cleaning & forcing lower case the attributes?

Thanks, Daniel