NIST-ISODB / isotherm-digitizer-panel

Contribution form for NIST adsorption isotherm database implemented using pyviz panel
2 stars 3 forks source link

Feature parse #38

Closed dwsideriusNIST closed 4 years ago

dwsideriusNIST commented 4 years ago

Proposal for new parser for isotherm_data block:

  1. Converts various delimiters to whitespace (tabs are column delimiters in paste-from-excel, at least on my Mac)
  2. Collapses whitespace in case multiple spaces are between columns
  3. Uses Pandas to interpret the column data, then sends the output to ndarray

One outstanding issue: the separator declaration is ',| ' - but the comma should be unnecessary. When the comma is not included, pandas creates too many columns when lines have trailing whitespace. I don't yet understand this yet, need to consult docs

Will include examples below

dwsideriusNIST commented 4 years ago

Example of pastes from non-CSV data

Example 1 (paste from MS Excel):

0.03362 | 0.1127 0.08609 | 0.3133 0.15333 | 0.5590 0.46870 | 1.5469 0.72747 | 2.1537 0.92713 | 2.5158 1.07506 | 2.7393 1.21081 | 2.9156 1.40979 | 3.132

That didn't work, as the tab was converted to |. Please try a direct paste from MS Excel if you have it. I've also tested this with LibreOffice Calc, Google Sheets, and Gnumeric, all of which worked fine.

Example 2: using too much white space. (Libreoffice does this.)

0.03362 0.1127 0.08609 0.3133 0.15333 0.5590 0.46870 1.5469 0.72747 2.1537

Example 3:

0.03362 ; 0.1127 0.08609 ; 0.3133 0.15333 ; 0.5590 0.46870 ; 1.5469 0.72747 ; 2.1537