dotnet / corefxlab

This repo is for experimentation and exploring new ideas that may or may not make it into the main corefx repo.
MIT License
1.46k stars 345 forks source link

A tsv file wouldn't load properly #2979

Closed liugaocn closed 4 years ago

liugaocn commented 4 years ago

I have a tsv file with too many blank cells in one column. DataFrame.LoadCsv wouldn't load it properly. What's worse is that visual studio only gave me blank build errors, no messages in detail at all, and I have to switch to Rider to identify the problem.

I hope there would be an option for setting columns which failed the data type guessing to strings.

QuickGO-annotations-1602846856552-20201016.txt

pgovind commented 4 years ago

Thanks for uploading the data set. I just looked at it! DataFrame.LoadCsv tries to guess the column type by inspecting the data in the first few rows. It defaults to a StringDataFrameColumn when we can't guess it's type. The ANNOTATION EXTENSION column has many empty rows here, so I'm guessing it became a StringDataFrameColumn. A couple questions for you:

  1. Is StringDataFrameColumn wrong for the Annotation Extension column? What did you expect instead? The LoadCsv method also takes an optional dataTypes parameter to specify the type of the column if you know it upfront.
  2. What's worse is that visual studio only gave me blank build errors: I don't understand this fully! Can you paste the build errors here?
liugaocn commented 4 years ago

There was no build error message at all in visual studio. It just won't debug or run, and also no red highlights for errors. But I was able to run the program with Rider, and Rider gave me some error messages such as "access to path denied" or "there was an error with ... ", something like so. Maybe there is a problem with my settings. I didn't know how to describe it so I used "blank", sorry for the confusion. I just used LoadCsv(). It is inconvenient to specify all the datatypes of every column, especially when there are so many of them, for example, 100 columns. So, I guess it would be nice if they all can be set to strings by default, especially when they failed to identify their datatypes. Most of the time, datatype really doesn't matter to me, but this may cause other problems.

pgovind commented 4 years ago

error messages such as "access to path denied" or "there was an error with ... "

I'm wondering if access to your file was restricted for some reason. When I tested it locally, I was able to see where the exception was being thrown. In any case, I opened #2982 to fix this bug!

pgovind commented 4 years ago

This is fixed now with #2982