Quasars / orange-spectroscopy

Other
53 stars 59 forks source link

Multifile reader should always handle plain text data column-wise #522

Open matheger opened 3 years ago

matheger commented 3 years ago

The Multifile reader behaves differently for plain-text files of different file extensions. For example, .dat files are read assuming that different spectra are stored column-wise; whereas .csv files are read row-wise. This behaviour should be made the same for any plain-text input file regardless of its extension.

I think it would also make sense to make two design changes for this widget. First, put it as the very first one in the "Spectroscopy" widget gallery so that it stands out to new users as the starting point for spectroscopic analyses. And second, rename it to just "File" reader; "Multifile" suggests a bit of a special purpose -- I initially dismissed it since I only needed to import a single file.

Original post: "Spectral data in rows or columns?"

Not sure if this is an issue, or a design feature that I just don't understand. But when I load a csv file with different spectra in columns, I have to transpose the file before the spectroscopy widgets understands them correctly - i.e., the data points for each spectrum have to be along the rows, not the columns of the data table. Is that by design, or am I messing something up?

borondics commented 3 years ago

Hi @matheger, thanks for the comment.

As you say, this is not really an issue but important to answer! In Quasar/Orange Data Tables we have the experiments (spectra) stored as rows, which conforms to the machine learning world. The feature names are the energies your spectra is taken at (wavenumbers for IR for example).

If your data is stored the opposite way in the csv file you can still read it in but you would need to add a "Transpose" widget to turn it the "right" way. Also verify that the energies are transformed properly into feature names.

Alternatively, you could rename your file from csv to one of the spectroscopy file extensions (dat, dpt or xy) where the file structure is similar to yours as far as I understand and the transformation into a Data Table is taken care of automatically.

Let me know if this helps or send along an example file and I can make you a workflow that handles it.

matheger commented 3 years ago

Thanks! That confirms a bit of a suspicion that I started to have. Using the "Transpose" widget is what I've been doing so far, but to be honest, it feels a little awkward if I have to include it every time. You mentioned renaming the file - indeed, as a dat file my data is handled correctly. However, I'd much rather leave it as csv for convenience and wider integration purposes in our lab setting.

I'd suggest to change the behaviour of the Multifile reader so that any plain-text input file is read "column-wise". I think it's safe to say that this would be expected behaviour for any such file from a spectroscopist's point of view. I'll adjust my original post to reflect this.

markotoplak commented 3 years ago

@matheger, thanks.

We were thinking of auto-detection of orientations some time ago, about sometimes it is hard to say whether the input is column-wise or row-wise (both could be valid). Also, sometimes other have csv files oriented row-wise, especially previous Quasar users. :)

Could you show us which kind of files are you dealing with? Perhaps paste a few lines?

What I see we could immediately do to improve your experience is:

  1. Allow reading of csv files with the spectra ASCII reader that currently reads dat. The users will need to manually select that file reader from a dropdown. That would make it possible to read column-wise csv files without renaming.
  2. Document different options of reading Spectral ASCII files.

The dropdown I mentioned the one on the bottom here: image

What do you think? Would that work for you?

And then, perhaps, if we could do it reliably: if a specific file format was not chosen, we could try to autodetection. We were trying to do this some time ago but we did not find a good solution though.

matheger commented 3 years ago

Hi Marko! My files are just boring old mass spectra with the x data in the first column and a series of y data columns after that. I'll attach a sample here.

Your first suggestion would work fine in my case. Documentation would probably also help.

How about a setting in the Multifile widget where the user can choose whether to read their files column- or row-wise?