galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.39k stars 1k forks source link

Handle preview and count of empty lines correctly in datasets #2694

Open tnabtaf opened 8 years ago

tnabtaf commented 8 years ago

Hi All,

The Remove beginning of a file tool (galaxy/tools/filters/remove_beginning.xml) isn't removing any lines. I tested this on a current cloud instance and on test with plain text files and tab delimited.

nsoranzo commented 8 years ago

Mmmh, this works for me both on usegalaxy.org and my local instance and on both txt and tabular datasets. Can you share a history or the affected dataset(s)?

tnabtaf commented 8 years ago

Nicola, I've figured out what's going on, and it doesn't have anything to do with the Remove beginning of file command. I'm not sure if this is a bug, but it definitely is not a feature.

Take a look at https://usegalaxy.org/u/tnabtaf/h/removing-5-blank-lines. The uploaded Analysis (9).txt and Analysis (10).txt files were exported from the enrichment tool on the Gene Ontology home page. These files have 3 sections:

I erroneously assumed that the Remove beginning tool was not removing the lines because of

  1. the way Galaxy's preview function in the history pane counts lines, and
  2. how tab delimited (but not plain text) data is displayed in the middle panel.

Analysis (9) has a datatype of txt, the default guess if you don't specify datatype in the upload form. Analysis (10) has a datatype of tabular. This datatype was manually assigned.

My confusion is because

  1. The previews of both Analysis (9) (txt) and Analysis (10) (tabular) say there are 84 lines in the files, when there are in fact 89 lines,
  2. Viewing the contents of Analysis (10) (tabular) in the center panel does not show the leading 5 blank lines.
  3. The previews of both datasets after removing the first 5 lines (which are blank) of both datasets says there are 84 lines in the files, making it look like nothing has changed.

I suggest:

  1. If there are blank lines in tabular files, display them in the middle panel. Don't just drop them.
  2. Have the line counts shown in the preview include blank lines.
sszakony commented 8 years ago

I'm submitting a pull request for this issue... hopefully my changes will resolve the problem.

tnabtaf commented 8 years ago

Thank you @sszakony !