GarmanGroup / RADDOSE-3D

Time and space resolved simulations of X-ray induced damage to crystals
http://raddo.se
GNU Affero General Public License v3.0
11 stars 7 forks source link

More on issue of reading PDB file when using 'AbsCoefCalc EXP' with (eg) 'PDB 3i40' #5

Closed LeighCarter closed 3 years ago

LeighCarter commented 7 years ago

As has been posted, the previous code in CoefCalcFromPDB.java, which was:

// return String.format("http://www.pdb.org/pdb/download/downloadFile.do?"
//        + "fileFormat=pdb&compression=NO&structureId=%s", pdbName);

indeed no longer works.

I also tried what has been suggested as working, ie:

// return String.format("https://files.rcsb.org/download/%s.pdb", pdbName);

but, not sure why, that didn't work for me either.

However, I found that the following did work fine for me, and I have successfully read thousands of PDB files using it:

return String.format("https://files.rcsb.org/view/%s%s", pdbName, ".pdb");

Also, one issue/bug remains even with that last version which works. If the PDB entry four-character ID happens to be of the following type: digit-e-digit-digit, such as eg 1e55 or 3e00 or 4e41 or 5e65, then the program stops/crashes. What is happening is that the source code is assuming that the ID is a number given in exponential format (rather than four random alpha-numeric characters) and it seems that this is hard-wired in the source code as a format which is not allowed for the PDB ID. It's not a major issue as the vast majority of PDB IDs are not in this apparent number exponential format, although it would be nice to be able to process these IDs like all the others.

Anthchirp commented 7 years ago

don't have a development environment around, but what exactly do you mean it stops/crashes? Do you have a trace? If it just stops or ignores the error instead of crashing or handling it then that is the first bug. I've seen a couple of those instances in the code, eg https://github.com/GarmanGroup/RADDOSE-3D/blob/master/src/se/raddo/raddose3D/CoefCalcFromPDB.java#L432 which will serve to hide the actual problem instead of failing immediately. If you get a proper crash, then that is the second problem.

As far as I can tell the entry name is kept as string from parser through to the download URL generation. I suggest putting some debug output in, and identifying what the actual URL is that is attempted.

Regarding the URL used for downloading the /download/ one seems to work from here, so I guess there is another issue that is hidden by ignoring exceptions rather than handling them properly. As I said, no development environment, so can't be of much further help at this time. But the parser part seems to be alright.

Anthchirp commented 7 years ago

An example input file and the entire output of RADDOSE 3D running on said file would help.

LeighCarter commented 7 years ago

[Can I just reply to this email?] [Adding Gerard for his interest]

Hello Markus,

Thank you for your emails. I didn’t mean to trouble you or anyone else.

My experimentation/debugging of the problems I encountered has been done within Eclipse.

Regarding reading the PDB file content from the different URL, i.e. files.rcsb.org/view/xxxx.pdb, as opposed to <…>/download/<…>, it works for me and I was just reporting my experience in case anyone else had a similar problem. It didn’t take me too long to figure out a solution, but always worth sharing, I thought. I wasn’t planning to spend more time debugging that.

Regarding the problem with the PDB ID of the form digit-E-digit-digit (call this nenn), I think my description is all you need to reproduce it, but here is some more information.

Firstly, it probably is a proper crash, apologies for the loose terminology.

See the attached text file, "info for markus about nenn PDB ID failure.txt”, which is divided into three parts, separated by ‘------------------'.

The first part is all you need as an example input file to reproduce the problem.

The second part traces through where the exception occurs, all within InputfileParser.java. The line ‘u=pdb()’ is the one which processes the input file line ‘PDB xxxx’. When the PDB ID xxxx is in an acceptable alpha-numeric format then the program sails happily through. But when it is in this nenn format, the program fails in that ‘a=(Token)match(<…>)’ routine, which leads to the RecognitionException and reportError were the value of ‘re' is as shown ‘MismatchedTokenException (id=27)’.

See the attached screenshot jpg file, "info for markus about nenn PDB ID failure.jpg”, which gives you the error messages in the Eclipse console window (I believe identical when running ‘java -jar zzz.jar -i zzz.txt’ in a command window).

The third part of the attached text file gives you all the PDB IDs with which I experienced the same problem, all in the same nenn format. Since I had no problem at all with > 49,000 PDB IDs not in this format, I thought my conclusion was a safe one to reach.

I don’t actually need this problem fixed, again I was just sharing in case it helped others. I did try and debug deeper to see if I could find a way of overcoming it, but that quickly looked too deep/too intricate.

I’ll take this opportunity to update you on the sequel to our visit with Gerard to see you and Elspeth and others in early March. I ended up delving into the source code myself and found some ways to mitigate the effects that we reported that day.

We hope all is going well for you.

Regards, Leigh

On 18 Oct 2017, at 13:07, Markus Gerstel notifications@github.com wrote:

An example input file and the entire output of RADDOSE 3D running on said file would help.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/GarmanGroup/RADDOSE-3D/issues/5#issuecomment-337569278, or mute the thread https://github.com/notifications/unsubscribe-auth/AfX-e1mMrBooUnbVLfDeHj-G_nL6TvdHks5sten_gaJpZM4P9gqI.

Crystal Type Cuboid Dimension 10 10 10 PixelsPerMicron 1 AbsCoefCalc EXP PDB 4E04


InputfileParser.java

<...>

u=pdb();

<...>

a=(Token)match(input,STRING,FOLLOW_STRING_in_pdb2693);

<...>

catch (RecognitionException re) { reportError(re); // MismatchedTokenException (id=27) recover(input,re); }

<...>

private void dispatchUncaughtException(Throwable e) { getUncaughtExceptionHandler().uncaughtException(this, e);


1e12 1e55 1e56 3e00 4e00 4e01 4e02 4e04 4e05 4e06 4e10 4e12 4e13 4e19 4e20 4e22 4e27 4e28 4e30 4e34 4e35 4e36 4e37 4e40 4e41 4e42 4e44 4e45 4e46 4e50 4e52 4e53 4e55 4e56 4e57 4e58 4e59 4e60 4e67 4e68 4e70 4e73 4e75 4e79 4e84 4e87 4e89 4e90 4e95 4e97 5e02 5e03 5e04 5e05 5e06 5e08 5e09 5e10 5e11 5e12 5e13 5e16 5e17 5e18 5e20 5e21 5e22 5e23 5e24 5e25 5e26 5e27 5e28 5e29 5e30 5e31 5e32 5e33 5e34 5e35 5e36 5e37 5e38 5e40 5e41 5e43 5e44 5e46 5e47 5e50 5e51 5e52 5e53 5e54 5e55 5e56 5e57 5e58 5e59 5e61 5e62 5e63 5e64 5e65 5e66 5e67 5e68 5e70 5e71 5e72 5e73 5e74 5e76 5e78 5e83 5e84 5e85 5e86 5e88 5e89 5e90 5e91 5e92 5e93 5e94 5e95 5e96 5e97 5e98 5e99

jdickerson95 commented 6 years ago

I've managed to fix this problem so the latest released version should now be able to deal with PDB codes that look like exponents (nenn).

LeighCarter commented 6 years ago

Many thanks for letting me know, and thank you/well done for doing that fix. I had noticed your recent new release, but hadn’t yet got around to looking at it and testing it. Regards, Leigh

On 17 Jan 2018, at 13:48, jdickerson95 notifications@github.com wrote:

I've managed to fix this problem so the latest released version should now be able to deal with PDB codes that look like exponents (nenn).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/GarmanGroup/RADDOSE-3D/issues/5#issuecomment-358309336, or mute the thread https://github.com/notifications/unsubscribe-auth/AfX-e_1OvWzfr9lf4jzoKdZvpsACws6tks5tLfoRgaJpZM4P9gqI.