Closed Jwink3101 closed 7 years ago
On Thu, Oct 26 2017, Justin Winokur wrote:
2) if (and only if) the top line begins with a standard comment character ('#' or '%'), it is removed.
I do almost the same with gtabview, by the comment is just stripped, followed by column labels.
There's also the additional rule that if # is followed by an empty line, the empty line is also removed. Those are generated by pandas when a multiindex is used.
I wasn't aware of '%' though.
Could you provide some sample files (garbage data is fine)?
@wavexx in the pull request I included a sample file: commented_annotated_numeric.dat
. I will attach it below as well (but with .txt
for gitbuh to allow it)
commented_annotated_numeric.txt
Numeric tabular data is common in Matlab (and Numpy) and often contains the annotations at the top (or at least, it should!!!).
I added some changes into my 'develop' branch to add python 3.x support to this PR. I know the tests passed (because we still need to add tests for the PR), but please actually check Python 3 compatibility with any further changes. Thanks!
I tested it under Ubuntu 17.10 with Python 3.6.3. I viewed data in samples directory. It works fine Thanks
Added test and merged. Thanks!
1) replaces multiple spaces (such as those to align columns) with a single space. This respects spaces inside of quoted strings (via shlex.split) but is really intended for numeric data.
2) if (and only if) the top line begins with a standard comment character ('#' or '%'), it is removed.
This type of data file, especially with comments, are common in processing data in Matlab NumPy.