biolab / orange3

🍊 :bar_chart: :bulb: Orange: Interactive data analysis
https://orangedatamining.com
Other
4.7k stars 992 forks source link

Inconsistent and Incorrect conversion of text "NA" to unknown in the File widget #6808

Open Newbrie opened 1 month ago

Newbrie commented 1 month ago

What's wrong?

On importing standard CSV data file with a category column , it sometimes converts the category text "NA" to "unknown" ie "?" , but not always.

My current workaround is to not use NA but rename it NX and it works fine.

How can we reproduce the problem?

Filebug.ows.zip

Test2.csv

Instructions: 1 - check the contents of the test2.csv to seee the innocuous use of "NA" as a category value in the "PD" column.

2 - Now open the filebug.ows in Orange3, open the File widget to upload the Test2.csv data.

3 - Open the Data Table to see how the data has been uploaded, scroll down to where you expect to see the "NA" text value and notice that precisely the rows which use the "NA" value have been modified and the "NA" replaced with "?"

The behaviour is inconsistent because if you create a smaller table using the category values "NA", the import works fine.

What's your environment?

processo commented 1 month ago

I can confirm this.

"File" and "CSV File Import" both do this. Neither cares whether NA is put in quotation marks. The only difference is "File" still converts to ? even if "text" type is chosen, "CSV File Import" does not.

Newbrie commented 1 month ago

Thanks - hadn't noticed that. so I could CSV File Import and choose text as a workaround.

processo commented 1 month ago

@Newbrie Yes, if you put an Edit Domain after import you can even convert it into categorical.

janezd commented 4 weeks ago

To sum up:

I suppose the latter is why @markotoplak marked this as a bug (and I agree).