bryanhanson / readJDX

Import spectroscopic data in the JCAMP-DX format
https://bryanhanson.github.io/readJDX/
8 stars 2 forks source link

No pattern for reading NIST MASS SPECTRUM #18

Closed rkjulian closed 1 year ago

rkjulian commented 1 year ago

All of the mass spectrum data from the NIST webbook use the data format identical to ##XYPOINTS, but in the files NIST uses ##XYDATA as the variable:

TITLE=Library Entry 370

JCAMPDX=Revision 4.10

DATA TYPE=MASS SPECTRUM

SAMPLE DESCRIPTION=IUPAC_Name=4,5alpha-epoxy-14-hydroxy-3-methoxy-17-methy

l-morphinan-6-one SMILES=CN1[C@H]2[C@@]3(O)[C@]4(CC1)C@([H])C(CC3)=O Cayman_Chemical=15465 Created=06/29/21

NAMES=NIST_TOMKitLibrary

CAS NAME=oxycodone

MOLFORM=C18H21NO4

CAS REGISTRY NO=000076-42-6

MP= -300

BP= -300

MW= 315

$RETENTION INDEX=0

$CONDENSED SPECTRUM=NO

NPOINTS= 87

XYDATA=(XY..XY)

55.0178 360.2 58.0651 860.3 ...

END=

When I change the file content on line 17 from XYDATA to XYPOINTS the reader can consume the files. This appears to be some kind of conflict in formats since the XYDATA pattern in findVariableLists is ^\\s*##XYDATA\\s*=\\s*\\(X\\+\\+\\(Y\\.\\.Y\\)\\)$ which suggests that some other format is expected for XYDATA than what NIST is using.

Perhaps a pattern can be added for NIST XYDATA: ^\\s*##XYDATA\\s*=\\s*\\(XY\\.\\.XY\\) which should allow users to read NIST data without editing the file.

bryanhanson commented 1 year ago

I don't recall what the standard says about that label, but will take a look. Thanks for the clear report!

bryanhanson commented 1 year ago

The standard doesn't allow mixing the two formats. But I don't see any reason why we can't implement your suggestion. However, I went to get some test files, and all the random files I selected all use a PEAK TABLE format. Can you post some links to where you are seeing the format you reported? I even pulled the oxycodone entry you had and it's different, written with JCAMP version 4.24. The one I see is here.

bryanhanson commented 1 year ago

Closing, no response from OP. Can reopen later if desired.