SimonEnsemble / PorousMaterials.jl

Julia package towards classical molecular modeling of nanoporous materials
GNU General Public License v3.0
51 stars 11 forks source link

Crystal reader can't deal with files where there are numbers in the first column of _atom loop #122

Closed Surluson closed 4 years ago

Surluson commented 4 years ago

PorousMaterials fails to read the following cif file. From a quick glance, it looks like the reader assumes atom labels are in the first column in the _atom loop.

Some cif files only contain the atom_site_label tag (in addition to the coordinates) so it's not as easy as only looking at one specific column.

Thoughts how to deal with this?

data_C132H126N12O18_2018-02-15_11:45:50
#******************************************
#
# CIF file created by Zeo++
# Zeo++ is an open source package to
# analyze microporous materials
#
#*******************************************

_cell_length_a      30.8683   
_cell_length_b      30.8684   
_cell_length_c      35.0319   
_cell_angle_alpha       90   
_cell_angle_beta        90   
_cell_angle_gamma       120   

_symmetry_space_group_name_H-M      'P1'
_symmetry_Int_Tables_number     1   
_symmetry_cell_setting      Monoclinic

loop_
_symmetry_equiv_pos_as_xyz
'+x,+y,+z'

loop_
_atom_site_label
_atom_site_type_symbol
_atom_site_fract_x
_atom_site_fract_y
_atom_site_fract_z
1   C   0.502983    0.988842    0.009974
2   C   0.501284    0.956851    0.039354
3   C   0.465216    0.905491    0.039245
4   C   0.429763    0.886817    0.009378
5   C   0.430024    0.918239    0.97983
6   C   0.467501    0.969506    0.980015
7   H   0.527016    0.971118    0.060998
8   H   0.403333    0.849619    0.009162
9   C   0.541625    0.039467    0.010572
10  C   0.540608    0.074255    0.036399
11  C   0.580923    0.124471    0.037492
12  C   0.621053    0.138657    0.011623
13  C   0.622595    0.104336    0.985773
14  C   0.582884    0.054506    0.986044
15  H   0.649788    0.174972    0.011506
16  H   0.583779    0.02903 0.967769
17  O   0.467179    0.001084    0.952894
18  O   0.498429    0.059384    0.057606
19  H   0.493664    0.007029    0.934514
20  H   0.501097    0.041448    0.079872
21  C   0.664327    0.119912    0.958184
22  H   0.681599    0.096708    0.962089
23  H   0.693343    0.158636    0.963884
24  C   0.583509    0.164241    0.063369
25  H   0.57979 0.191077    0.045295
26  H   0.620658    0.183867    0.076563
27  C   0.465103    0.872576    0.070308
28  H   0.460975    0.88733 0.097619
29  H   0.433034    0.834776    0.067247
30  C   0.391628    0.897243    0.949104
31  H   0.373876    0.92014 0.946218
32  H   0.361987    0.859607    0.957005
33  C   0.054578    0.5082  0.67664
34  C   0.086569    0.538492    0.706022
35  C   0.13793 0.553784    0.705911
36  C   0.156603    0.537003    0.676044
37  C   0.125179    0.505843    0.646496
38  C   0.073915    0.492055    0.646682
39  H   0.072303    0.549957    0.727665
40  H   0.193801    0.547773    0.67583
41  C   0.003954    0.496218    0.67724
42  C   0.969166    0.460412    0.703065
43  C   0.918951    0.45051 0.704158
44  C   0.904762    0.476452    0.67829
45  C   0.939087    0.512318    0.65244
46  C   0.988917    0.522437    0.652711
47  H   0.868449    0.468877    0.678173
48  H   0.014389    0.548805    0.634436
49  O   0.042336    0.460154    0.619561
50  O   0.984036    0.433105    0.724274
51  H   0.036392    0.480694    0.601181
52  H   0.001972    0.453709    0.74654
53  C   0.923511    0.538473    0.62485
54  H   0.946712    0.578948    0.628755
55  H   0.884787    0.528766    0.630551
56  C   0.87918 0.413328    0.730035
57  H   0.852342    0.38277 0.711963
58  H   0.859552    0.430849    0.743228
59  C   0.170845    0.586587    0.736977
60  H   0.156089    0.567703    0.764286
61  H   0.208645    0.592317    0.733914
62  C   0.146177    0.488445    0.61577
63  H   0.123279    0.447795    0.612884
64  H   0.183812    0.496439    0.623672
65  C   0.535221    0.540437    0.343307
66  C   0.504928    0.542136    0.372686
67  C   0.489636    0.578204    0.372578
68  C   0.506416    0.613658    0.34271
69  C   0.537578    0.613396    0.313163
70  C   0.551367    0.575922    0.313349
71  H   0.493463    0.516403    0.394332
72  H   0.495647    0.640087    0.342496
SimonEnsemble commented 4 years ago

@Surluson is this the symmetry reader branch or master?

Surluson commented 4 years ago

This fails on both branches

SimonEnsemble commented 4 years ago

@ahyork in case you have an opinion.

how about making a PR on top of the symmetry branch: if only _atom_site_type_symbol or atom_site_label exists, then use that one. if both exist, choose a default as we do now? @Surluson can you look up the difference between these two labels and how they are used? my understanding is that we should default to atom_site_label since the _atom_site_type_symbol will always be an element but sometimes in molecular simulations we want to distinguish between atoms of the same element, like C_aromatic vs. C_double_bonded.

eahenle commented 4 years ago

Resolved as of #145