arohl / gdis

A visualization program for the display, manipulation, and analysis of isolated molecules and periodic structures
GNU General Public License v2.0
43 stars 17 forks source link

Add extra functions for alternative origin choices #45

Closed cajfisher closed 2 years ago

cajfisher commented 3 years ago

Functionality added to cope with alternative cell origins when constructing crystal from space group symbol Also checks if atom coordinates correspond to common fractions (from truncated recurring decimals) to minimize precision errors when applying or determining symmetry Space group name format also standardized for display

cajfisher commented 3 years ago

Update cif keywords (as some now deprecated) Replace PI constant with G_PI and standardize conversions between degrees and radians Fix bug in model:content display when shells removed Add some model properties window updates where missing

ovhpa commented 3 years ago

Hello,

Thank you for your contribution.

I have encounter some few bugs with the static analysis and a bug while reading a cif file: The first one:

Description: 1st function call argument is an uninitialized value
File: space.c
Line: 191

The impacted code is reported to be:

vecmat(*(model->sginfo.matrix+j), sr_core→x);

However, I have gone up a little (fixing this bug will wake up another one, etc..) and I found it to be linked to a rare case in the function space_lookup where, in case of a corrupt lattice definition, the loop at line 577 may be skipped. I have no doubt that this should rarely happen, but I propose to add a safety net before the loop. Something as simple as:

if((SgInfo.LatticeInfo->nTrVector < 1)||(SgInfo.nList < 1)) return 1;/*in case SgInfo is ill-defined!*/

Or anything similar should do the trick (for the static analysis).

The other ones are minor and are optimized away (setting -O2 in the makefile is enough to have them gone). They all are linked to the function SgLabelCmp and with the definition of its parameter WtdLbl. I’m not sure it’s connected to the PR, but here are the reports:

1 Description: Branch condition evaluates to a garbage value
1 File: sgio.c
1 Line: 125
2 Description: Branch condition evaluates to a garbage value
2 File: sgio.c
2 Line: 151
3 Description: The left operand of '!=' is a garbage value
3 File: sgio.c
3 Line: 156
4 Description: Branch condition evaluates to a garbage value
4 File: sgio.c
4 Line: 187

The bug connected to the cif file happens when I try to load the file 4126814.cif from the COD database (http://www.crystallography.net/cod/) in the cif/4/12/68/ directory. I tried to include the file in the report, but I had to compress it so github would accept to include it, so please tell me if you want me to send you the original cif file. 4126814.cif.gz GDIS is then terminated with the error:

*** stack smashing detected ***

The trace of the debugger are not so useful because gdb report the "stack smashing" only by the end of the function it appear. I can obtain 2 traces similar to:

#5  0x00005555555e6ac4 in zone_index (position=0x555555e3b100, data=0x555555bc7f40) at zone.c:359
#6  0x00005555555e65b4 in zone_make (zone_size=4, model=0x555555e99800) at zone.c:229
#7  0x00005555555e6793 in zone_init (model=0x555555e99800) at zone.c:282
#8  0x00005555555ae95c in model_prep (data=0x555555e99800) at model.c:664
#9  0x0000555555610d32 in read_cif (filename=0x555555bcc370 "4126814.cif", data=0x555555e99800) at file_cif.c:776
#10 0x000055555560dae4 in file_load (filename=0x7fffffffc83c "4126814.cif", mdata=0x0) at file.c:1457
#11 0x00005555555acc11 in main (argc=2, argv=0x7fffffffc2b8) at main.c:795

and

#5  0x00005555555e674e in zone_make (zone_size=4, model=0x555555e98800) at zone.c:274
#6  0x00005555555e6793 in zone_init (model=0x555555e98800) at zone.c:282
#7  0x00005555555ae95c in model_prep (data=0x555555e98800) at model.c:664
#8  0x0000555555610d32 in read_cif (filename=0x5555559104e0 "4126814.cif", data=0x555555e98800) at file_cif.c:776
#9  0x000055555560dae4 in file_load (filename=0x7fffffffc83c "4126814.cif", mdata=0x0) at file.c:1457
#10 0x00005555555acc11 in main (argc=2, argv=0x7fffffffc2b8) at main.c:795

I tried to figure this out, but I can’t tell if there is a problem in the read_cif function or with the cif file itself. However the latter seems less likely since there are other failing files in the database (with the same symptoms).

Maybe you will see what's wrong more easily ;)

Best,

cajfisher commented 3 years ago

The bugs you found are not affected by the PR, but were already present in the cif reader and space group lookup functions. I've made changes along the lines you suggested to spacegroup_lookup that should avoid the first problem. The second problem (fixed by optimization flags) is within the sginfo code (sgio.c) and hasn't been changed for gdis. Actually, I forgot to correct the bugs in the sginfo.h file (listed on the sginfo website http://cci.lbl.gov/sginfo/) but have done so now. Speaking of sginfo, it is rather old code now and has been superseded by sgtbx. When someone has a lot of time on their hands, it would be good to update gdis to use these new, more robust, libraries; the benefits, however, may not yet outweigh the cost in terms of time, and for the time being sginfo seems adequate for the vast majority of cases.

The third problem appears to be caused by extraneous information not needed by gdis being handled incorrectly. I haven't identified the exact place where this happens yet, but when all but the essential lines are removed the structure is loaded correctly, so the problem is definitely hidden in there somewhere. I will look into this soon.

Regarding the PR, I did find another problem, though. The program crashes when reading some non-conventional space group symbols which didn't cause a problem before because the space group number was being used instead. I will try to fix this soon too.

ovhpa commented 3 years ago

Thanks, I will wait for your fix, then. About the move to sgtbx, I think that, as you said, it is a lot of time for little gain. In my opinion, given that someone would have that much time, moving out of the deprecated GTK2 would be a more pressing move. There are so many deprecated things in GTK2, and its association with glib and newer distribution will inevitably break at some point. Anyway, such a move is beyond my current timeline, maybe when I have more time I will be able to look for GTK3... Best,

cajfisher commented 3 years ago

The third bug turned out to not be what I thought. It was caused by the cif file containing the same lattice parameter info twice, resulting in a periodicity of 6 because the cif reader thought the file only contained one model instead of two. As already noted in the file_cif.c's comments, cif files don't have a definitive keyword for denoting the beginning of new structure info, so the readcif code originally only checked for "audit" as the keyword. In fact, "data" can also indicate a new structure, so I have modified the code to cope with this situation. I've tested it on several cif files and it works fine so far, but there's always the possibility that some more complicated (multiple structure) files might trip it up.

The final bug I "discovered" in the PR was in fact accidentally introduced by me when searching for the cause of the third bug, and has been corrected.

ovhpa commented 3 years ago

Hello, I'm running into problem with the last changes, I think. See for ex. 8000469.cif.gz Previously GDIS could open like: before Now two models are created; an empty one that contains proper cell information: after2 and one that contain atomic informations (but not in the cell): after

Another problem seems to come from the ion definition, for ex. in 6000015.cif.gz This is not link to the PR, I think.

Finally, as you mentionned previously, it seems that when there are many structure and the file is quite complicated, then GDIS will crash, see 3000040.cif.gz But simple - yet wrong - cif files also crash GDIS, like 6000773.cif.gz

I think it would be nice enough if GDIS could produce a simple error message "Unable to load" instead of crashing with a core dump...

Sorry again for asking you to check the cif code, but can you have a look at it?

Best,

cajfisher commented 3 years ago

The problem seems to be that the cif reader was also scanning comment lines for keywords. This is the case with the cif files your attached, so the model data wasn't being read correctly. I've added a logical switch to ignore comment lines, so it should now work for these and other similar files.

ovhpa commented 2 years ago

Hello, Good news is the *** stack smashing detected *** bugs seems to have almost gone away. I can still find some occurrences in the files 7702372.cif.gz 7702373.cif.gz, for example.

I found some other cases in which the cif files can no longer be opened properly: Before GDIS would open 2002956.cif.gz like this: new_before

Now, an empty model is created instead: new_after

Some example of files that can no longer be opened: 2002956.cif.gz 2002957.cif.gz 4022830.cif.gz

The issue in which a model containing an empty cell (with correct cell parameters) is opened and another model is opened with (un-scaled) atoms can still be found in quite a few cif files. I have gather some of them in this archive cell_and_atom.tar.gz

I have also encounter a new bug which causes GDIS to crash in the following files: 2009228.cif.gz 2012011.cif.gz. At first look, it seems that there is a problem to get the proper file species (probably due to cif lax formatting rules).

Finally, there seems to be a problem in reading the file 2108845.cif.gz, which was previously also incorrectly read by GDIS -- but a different kind of incorrect ^^’ The original GDIS showed only 24 atoms, which I guess were the fractional position prior to any symmetry operation: 2new_before

Now GDIS display 712 atoms, within the correct group of symmetry: 2new_after

However, there should not be that much atoms! In my understanding there should be only 380 atoms generated, am I missing something?

If you need some more files related to the above problems I try to find some more. However going through the cif database is quite long, so please let me know ;)

Best,

cajfisher commented 2 years ago

It turns out the cif format is a bit more complicated than I anticipated. The ";" marks a multi-line string which can't be skipped if gdis has detected a keyword, so I have fixed this. I also added a check for genuine comments starting with "#". The files you attached should now all open correctly. Note that 1) the 2108845.cif file contains 24 atoms in the basis group, with 464 sites in the Im-3 unit cell. This comes to 712 atoms because many of the sites have partial occupancies (multiple atoms on the same site). 2) 3000040.cif should no longer cause a crash; 3) crashes caused by 7702372.cif and 7702373.cif are due to errors in the cif files; both mistakenly contain keywords "_cell_length_alpha" and "_cell_length_beta" instead of "_cell_anglealpha", and so on, which messed up gdis' keyword detection. I have added a safety check to avoid this problem. If you know who generated these files or where they are stored, you may wish to let them know about the errors. (The intention of adding these to the cif file appears to have been to correct the wrong monoclinic angle in the original structure deposition, but because the keyword is wrong, VESTA and other programs, now including gdis, will not read the "corrected" monoclinic angle, leaving it as 90). Incidentally, according to the cif specifications, the marker of a new model is the "data" keyword, so I have removed the check for "audit" and used "data_" instead. This may need checking on other files, especially those with multiple structures. Some of the strings used to detect cif keywords by gdis also appear in keywords that we don't want them to detect, so I have made the string matching more stringent to avoid this.

ovhpa commented 2 years ago

Hello and thank you for your patience, I'm happy to tell you that all test passed on the PR so I can merge it!

About the 2108845.cif file of last time, there was actually 464 atoms generated: I forgot to remove the 0.995 cut-off limit of my script, which explains why I had only 380. When I remove that "feature" it find the 464 positions ;)