arborworkflows / ArborWebApps

A bundle of Tangelo applications used by NSF Arbor (Phylogenetic Comparative Methods system)
Apache License 2.0
9 stars 1 forks source link

table column name extraction can get confused #69

Closed curtislisle closed 9 years ago

curtislisle commented 10 years ago

Two separate wierd issues noticed with upload of slightly wierd data.

  1. With the following data entered, columns are processed correctly if the file has the species names and four rows, but not with the species names and five rows (see geospiza_discrete_four and .._five) loaded in Arbor. (note the spaces between some elements). There are no extra lines or extra characters at the end of lines.

"species","wingL","tarsusL","culmenL","beakD","gonysW" "magnirostris","4","3","3","3","3" "conirostris","4","3","3","3","2" "difficilis","4", "3", "2", "2", "2" "scandens", "4", "3", "3", "2", "2" "fortis","4", "3", "2", "2", "2"

screen shot 2014-06-07 at 8 56 31 am

  1. Processing of row entries in the above dataset is inconsistent. It seems to be related to spaces after the comma. Note that when viewing the "difficilis" row in arborweb, only some entries have quotes around them. I think our CSV processing should be tolerant of extra spaces if possible, or we need to provide guidance about the acceptable formats. Analyses fail when I use this datafile as source data.
lukejharmon commented 10 years ago

Being able to tolerate extra spaces is important - we can specify file formats, but this sort of issue has plagued phylogenetics software for too long. Thanks!

curtislisle commented 10 years ago

Strange, my second comment isn't visible: (2) processing of row elements was inconsistent on this file. If we review the table view through arborweb for this file, some numbers show with quotes around them, others don't. Maybe we will never allow this type of entry ( "3") when we don't mean 3, but I just wondered what was happening.

jeffbaumes commented 10 years ago

Let's get a test in for this data that's failing in romanesco, then work to fix. Curt can you add this to the tests/data folder? If you push a branch with this file I can take a look and work on the solution.

jeffbaumes commented 9 years ago

Now that we always assume column headers is this an issue?

curtislisle commented 9 years ago

Let's close for now and re-open only if necessary after use.