EGSnrc update for >9 egsphant materials requires egs_glib update

rtownson commented 2 years ago

https://github.com/nrc-cnrc/EGSnrc/pull/633 added support for 95 materials in ctcreate, and this implementation differs from the expanded number of materials handled by egs_glib (a hard coded list of characters). This results in egs_brachy egs_view crashing when loading an egsphant with >9 materials (and unknown behaviour when running a simulation).

At line 209 in egs_glib.cpp you can replace the for loop with something like this, as a fix:

    for (int i=1; i <= nmed; i++) { // Note the index from 1 so that 0 is reserved for vacuum
        string med;
        data >> med;
        char med_key = char(((i + 16) % 95) + 32);
        phant2egs_idx[med_key] = EGS_BaseGeometry::addMedium(med);
        phant2egs[med_key] = med;
    }

Eventually I expect all egsphant handling should be done in egs_xyzgeometry, but for now this is a quick fix.

Here's the actual list I changes I suggest:

diff --git a/HEN_HOUSE/egs++/geometry/egs_glib/egs_glib.cpp b/HEN_HOUSE/egs++/geometry/egs_glib/egs_glib.cpp
index 31240c0..13c50ee 100644
--- a/HEN_HOUSE/egs++/geometry/egs_glib/egs_glib.cpp
+++ b/HEN_HOUSE/egs++/geometry/egs_glib/egs_glib.cpp
@@ -206,11 +206,10 @@ EGS_BaseGeometry *readEGSPhant(istream &data, map<string, EGS_Float> med_rhos) {
     // read in media names and create map from phant medium code character to egs++ name
     map<char, string> phant2egs;
     map<char, int> phant2egs_idx;
-    string phant_meds_str = "123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
-    for (int i=0; i < nmed; i++) {
+    for (int i=1; i <= nmed; i++) { // Note the index from 1 so that 0 is reserved for vacuum
         string med;
         data >> med;
-        char med_key = phant_meds_str.at(i);
+        char med_key = char(((i + 16) % 95) + 32);
         phant2egs_idx[med_key] = EGS_BaseGeometry::addMedium(med);
         phant2egs[med_key] = med;
     }

MartinMartinov commented 2 years ago

That's odd, because I've run egs_view on the reference man/woman phantoms converted into egsphants before, and they had over 50 media (if I remember correctly). Though those egsphants did use the alphanumeric numbering that Randy implemented. Am I understanding correctly that the ctcreate code (and new EGSnrc standard) now outputs media with the following indices:

123456789;:<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_'abcdefghijklmnopqrstuvwxyz{|}~

This is somewhat concerning for CLRP, because we have legacy code and data going back at least 10 years that use the same alphanumeric standard Randy implemented in egs_glib. Including egs_brachy, 3ddose_tools, and the new collaborative GUI we are working on with clinics all over the country. Changing all of our code to conform to this type of egsphant would require a large overhaul that would also cause us to lose legacy support on egsphants used within CLRP for the last decade.

On a side note, I remember when I first wrote the expanded egsphant code in BrachyDose many years ago now, I started with your implementation, just trying to maximize the number of potential media. But after talking to Dave and some people at NRC, I was told some people still open up the egsphant as text files and use the character indices to visually check the phantom, and alphanumeric characters would make for clearer images when looking at media assignment per slice. That's why I pivoted to using the implementation Randy has, or the following admittedly hideous formula (from memory):

med_key = (i>96)?(i-48-7-6):((i>65)?(i-48-7):(i-48))

which still yields 132 runoff characters when using unsigned chars and the extended ascii table.

rtownson commented 2 years ago

I see the issue. I think we could resolve this by adding a new line in the new egsphant file format, an egsphant format version number. This would hopefully make it so that new egsphant files would crash until you update your code to read the new format. You could then support both formats by checking for the version number. Or we just switch to your indexing - I'll ask for Fred's opinion. I still find our new format to be very visually readable.

ftessier commented 2 years ago

We are discussing this on the EGSnrc side. Can someone please point to the location where the CLRP mapping is defined? I can't find an egs_glib.cpp file with a line 209, let alone a code block that defines the mapping.

MartinMartinov commented 2 years ago

@rtownson I definitely agree, it was the method I proposed when I was still an undergrad. I think we discussed it after a CLRP or OMPI meeting over a decade ago, but we ended up going with what Randy has now and have been using it since I've been with the group. We even recently dug up a very large archive of old patient files made in 2014 (I think) with this format. Wouldn't having a version number still cause issues for all the previous egsphants, since they would still have the no version number format.

@ftessier The code we're discussing is Randy's original egs_glib that we still use in egs_brachy because it follows the old egsphant standard (no ramp files). Here's the way Randy (and I) parse egsphants currently: https://github.com/clrp-code/EGSnrc_CLRP/blob/fe41c070a7b900a78055ec4935a9d662725e5953/HEN_HOUSE/egs%2B%2B/geometry/egs_glib/egs_glib.cpp#L209 which skips ":;<=>?@[]^_`" ascii characters. You can also define the code as 2 if statements or conditional operators that determine whether you are in 1-9, A-Z, or a-z if you want to avoid std::strings.

rtownson commented 2 years ago

If the new egsphant format had a version number, then you could check for it in egs_brachy. If it's not there, then you use the old code for the old format. If your code hasn't been updated to look for the version number, it should just crash because the number of parameters on each line won't work out. That's what I was thinking anyway.

MartinMartinov commented 2 years ago

Oh, yeah, that makes sense. So your new CT_CREATE would have a line at the start that reads:

version=1.1

and if it's missing, we would use the old system? And also, if we are talking about extra encoding at the beginning, would it be prudent to potentially add something like a format flag, in case we ever want to do non-rectilinear phantoms. I know it doesn't really come up, but it has bothered me in the past with 3ddose files that some antiquated codes (cough BrachyDose cough) would output 3ddose files for cylinders and spheres with only one or two axes with no header warning you of the like.

rtownson commented 2 years ago

Actually I was thinking the version number would be at the top of the egsphant file, not in ctcreate. And yes I agree about adding a format flag, I wasn't aware other formats were out there..

Fred had some other ideas instead of an egsphant version number in the top of the file, so we'll have to hash it out at some point...

MartinMartinov commented 2 years ago

Sorry, I should have said phantoms generated by CT_CREATE. I agree.

And I think a quick meeting to discuss it would probably be good to do before I start making changes on our end.

ftessier commented 2 years ago

I think that version numbers are useful to allow future extensions of the egsphant format. The problems I see though are:

the formats have to be defined in detail (cannot be defined implicitly through a code implementation); this should not be too hard for egsphant files.
format numbers imply a progression (version 2 is more recent, or "better", or a "superset" of version 1); this does not really match the situation for different encodings.

To solve the problem at hand, I would propose that instead of a format number, we add an encoding map: a line in egsphant file, which specifies its encoding as a string of ascii characters. The first char is medium 0, second char is medium 1, and so on. This immediately allows for any number of encodings based on ascii characters.

I am happy to drop the NRC encoding, presuming that only a handful of people have actually used it since it was released (v2021), and fall back on the egs_brachy char map by default (if no encoding is present in the egsphant file), thus safeguarding "legacy" egs_brachy files that use it.

We can also add a format number, provide the format is specified fully in a format reference document.

The entire root issue, of course, is that egsphant files are encoded as ascii characters. Crazy, this is so 80's! 😄 These should be stored as n-byte integers, which can be rendered in the terminal using a tiny (one-liner) conversion script, for reading and writing. Perhaps we could envision that instead for moving forward?

MartinMartinov commented 2 years ago

That sounds excellent @ftessier, custom strings for formatting seem like the most elegant solution.

And I couldn't agree with you more about ASCII encoding being an insane holdover from a different era. The I/O time of parsing all the ASCII text and the floating number decimals is just insane. I've been using .begsphant and .b3ddose binary variants in 3ddose_tools for a while now, which are essentially memory dumps of the data arrays to achieve reasonable loading times. But they feel like a somewhat slapped-together tack-on.

I've been thinking for a while now that it would be a good idea to add a binary data suite output option for egs++. Get egsphant, 3ddose, egsdat, ... etc files to actually output in a much shorter time span, and to stop storing values we tally using 64-bit mantissa as decimal strings that use more than 8 bytes to store values with lower precision.

ftessier commented 2 years ago

Well put, I agree with a binary data suite for EGSnrc. Worth opening a Discussion about that, under "ideas"; I'll leave the honor of launching that to you. There we can list the existing implementations such as .begsphant and .b3ddose etc, and discuss possible ways forward. Emerging standards for scientific data such as Apache Arrow come to mind. I would shy away from defining our own (although I want to keep it lightweight as much as possible).

mchamberland commented 1 year ago

ctcreate currently uses egs_brachy's encoding, so closing this for now.

clrp-code / egs_brachy

EGSnrc update for >9 egsphant materials requires egs_glib update #22