Make fort44 parser to support more versions of EIRENE fort.44 file

vsnever commented 3 years ago

Currently, the parser will throw an error if the version of fort.44 is not equal to 20130210 or 20170328. In b2plot source files I found the references to the following versions of fort.44: 960511, 960513, 960623, 960727, 961228, 20000727, 20051115, 20060206, 20071209, 20080706, 20081111, 20130210, 20170228, 20170328 and 20180323. I doubt it makes sense now to add support for all of these versions, however support can be added upon request.

I also noticed that instead of requiring an exact version match, b2plot uses >= operator. I think the same approach should be implemented in fort44 parser here. For example, I had to manually change the generomak version of the fort.44 file from 20081111 to 20130210 for the fort44 parser to read it. Yet, it was read successfully. I suggest that for now, all files older than 20170328 should be read by fort44_2013 parser and all other files by fort44_2017 parser. I think it's better to display a warning if the version of fort.44 does not match any of the versions guaranteed to be supported by the parser than to throw an error.

jacklovell commented 3 years ago

We deliberately made the version checks very restrictive, because neither @Mateasek or I could find a comprehensive list of the different fort.44 versions and their specifications. It may be buried in the git history of the b2plot source code, and extracting that wasn't an appealing prospect. We got it to work for the specific versions of fort.44 files we have available and need for our work, and then erred on the side of caution in rejecting all other un-tested versions.

We'd need to be very careful with accepting ranges of file versions: fort.44 is not self-describing and it's possible to read an incompatible format without error which produces a garbage simulation object, on the off chance that the array sizes of data in the file happen to be compatible with whichever fort.44 version we happen to be reading (I've seen this before). We'd have to manually check every data item read from every version to be confident.

The >= operator in b2plot does suggest some backward compatibility, so it makes sense to modify the reading logic for this too. But this is inconsistent with our observations and the current code, which suggests that a 2013 file is incompatible with our 2017 reader. It would be good to see how b2plot handles this, and then perhaps we should re-write our readers to behave similarly.

vsnever commented 3 years ago

Ok, I'll revert the respective change in #35 (after you review this PR), but at least, let's add 20081111 version in the supported list, because fort44_2013 parser reads it correctly.

Mateasek commented 3 years ago

Sorry for me being idle until now. The idea behind the current code is, as @jacklovell described, to allow parsing of the files we knew how to read and how to interpret the data contained. If you are sure you know we have a function which can correctly parse and interpret a datafile version, add it to the list. I am not the person to judge, my knowledge of the SOLPS is only as a output user.

SOLPS data format seems to be very prone to data misinterpretation. There is quite a high chance that a parsing function will parse a file and misinterpret the data and that is something we should avoid. Even if it means we will have to have more similar parsing functions.

vsnever commented 3 years ago

Yes, I already understand that letting fort44_2013 parser to read any version of fort.44 older than 20170228 until someone report a bug was a bad idea. Since fort44 parser reads only the main data from fort44, there is a high chance that it correctly reads not only 20081111 and 20130210 versions but some older versions too. I'll try to check the b2plot source code to clarify this, though the b2plot code is hard to follow.

vsnever commented 3 years ago

I was wrong, fort44_2013 parser incorrectly reads the version 20081111. The difference is in neutral radiated power (eradt in Eirene class). Everything else is parsed correctly.

In SOLPS-ITER there is a subroutine called b2yt_ngread that reads fort.44 file for plotting. It's located at modules/B2/5/src/convert/b2yt_ngread.F. It's quite possible to get what's going on there by following the gfsub2(...) calls (gfsub2(...) is an analogue for read_block44(...) in cherab-solps, but in a single call it can read the data only for a single species). The good news, there are no version checks before b2yt_ngread reaches the molecular source term (srcml). The bad news, the way how fort.44 is written after the edissml term depends not only on the version of fort.44 but also on the version of SOLPS (4.3, 5.0, 5.1, 5.2, iter) and also on whether a certain file is present in the simulation directory or not. For example, the term eneutrad (Eirene.eradt) can be resolved or summed over species depending on the version of SOLPS and not on the version of fort.44.

Also, it looks like both terms srcml and edissml are resolved over molecules in all versions of fort.44, so if more than one molecule is present in the simulation, then both fort44 parsers in cherab-solps will fail.

vsnever commented 3 years ago

Just to summarize, according to b2yt_ngread the content of fort.44 file depends on:

fort.44 file version (960511, 960513, 960623, 960727, 961228, 20000727, 20051115, 20060206, 20071209, 20080706, 20081111, 20130210, 20170228, 20170328, 20180323);
SOLPS version (4.3, 5.0, 5.1, 5.2, iter);
Eirene format (old, new, facelift, juelich, iter);
the presence of fort.33 file in the simulation directory (this is checked only if input.dat is not present).

Looks like SOLPS version and Eirene format are provided by the user and not read from files.

Everything up to molecular H-alpha emission can be read without any version check. The molecular source (srcml) and power loss due to molecule dissociation (edissml) can be read from any fort.44 starting from version 960511 independently on anything else. I didn't find in b2yt_ngread(...) the divergence "Molecule particle source" or "Power loss due to molecules (including dissociation)" from fort44_2013 and fort44_2017 parsers, although it's possible that the same variable srcml is used for storing different physical quantities. The terms srcml and edissml are resolved over molecules in all versions of fort.44, so this must be corrected in the parsers.

The only serious problem is how to read eneutrad (Eirene.eradt). The fort44_2013 parser will work successfully only if (solps_version == '5.0' or solps_version == '5.2') and (eirene_format == 'old' or eirene_format == 'new') and (fort44_version >= 20130210). In all other cases the location of eneutrad and whether it is resolved over the species or not will depend on the parameters listed above.

@jacklovell , @Mateasek, do you have access to SOLPS-ITER source code? I think Xavier can grant it to you if you have ITER IDM accounts.

Mateasek commented 3 years ago

Thanks @vsnever for this deep detective work. It is a pleasure to encounter such a data file, isn't it? There are few SOLPS-ITER users in our institute, I will try to get access to the code from them, I don't have ITER IDM account. My knowledge of Fortran is quite limited, so it will take me some time to get to the bottom of it. Is there anything particular you would like me to look into, so we don't work on the same thing? You started quite an earthquake in this package, so I'm happy to leave up to you what would you like me to help with.

vsnever commented 3 years ago

@Mateasek, I didn't mean that you should dig into this too). The last time I wrote something in Fortran was in 2006, so my knowledge of Fortran is also very limited.

jacklovell commented 3 years ago

Just to summarize, according to b2yt_ngread the content of fort.44 file depends on:

* fort.44 file version (960511, 960513, 960623, 960727, 961228, 20000727, 20051115, 20060206, 20071209, 20080706, 20081111, 20130210, 20170228, 20170328, 20180323);

* SOLPS version (4.3, 5.0, 5.1, 5.2, iter);

* Eirene format (old, new, facelift, juelich, iter);

* the presence of fort.33 file in the simulation directory (this is checked only if input.dat is not present).

Yuck. This feels like fort.44 is an internal file. There are so many variations that I wouldn't be surprised if each SOLPS user produced a slightly different format! I fear we'll end up constantly firefighting as new users with slightly varying set-ups try to use this package with incompatible output files.

Are there tools within SOLPS itself to convert simulation outputs to other formats (balance.nc, MDSPlus)? Perhaps it would be more reliable to rely on first-party conversions than to try reverse engineer something ourselves which will have an ongoing maintenance burden. This may not be so useful for historic runs where the original code is no longer available, but perhaps we should encourage this going forwards and then only have to maintain a more limited set of existing configurations.

I have an account to access git.iter.org, but no access to the SOLPS source code on there.

vsnever commented 3 years ago

Yuck. This feels like fort.44 is an internal file. There are so many variations that I wouldn't be surprised if each SOLPS user produced a slightly different format!

Despite the great number of versions of fort.44, SOLPS and Eirene formats, there are only 18 if-else statements in b2yt_ngread related to version checking. Update Unfortunately, there are much more than 18 if-else statements in b2yt_ngread related to version checking...

I support the idea that fort.44 parser in cherab-solps should work only with most popular configurations. Anyway, anything except neutral radiated power can be read with a general parser with a single version check (>=960511 or not).

Are there tools within SOLPS itself to convert simulation outputs to other formats (balance.nc, MDSPlus)?

There is a special tool "b2md", which can save the output to a MDSPlus server. I didn't find any dedicated tool for producing balance.nc files from an existing run, however according to manual there is a SOLPS option to produce this file at the end of the run. By the way, do you have any balance.nc file for testing? I've only made minimal changes to balance.py in #35, but chances are good that I broke it because I didn't have the input for testing.

I have an account to access git.iter.org, but no access to the SOLPS source code on there.

If you want access to the SOLPS-ITER repository, you should write to Xavier Bonnin and ask him adding you to SOLPS-ITER group. The fact that you are working on the SOLPS module for Cherab should be enough for him.

jacklovell commented 3 years ago

Despite the great number of versions of fort.44, SOLPS and Eirene formats, there are only 18 if-else statements in b2yt_ngread related to version checking.

If b2yt_ngread changes with different SOLPS versions, those 18 if-else statements could end up as a lot more actual branches we'd need to handle. It's a mess, and I'd rather limit the use of raw files to cases where it's impossible to obtain access to the simulation results in a sane format.

On a similar note, do the b2fgmtry and b2fstate files vary this much between SOLPS versions (even SOLPS 5 or -ITER versions)? Or are we just unlucky with the Eirene part of the code? We don't seem to have run into the same problems with B2 outputs.

vsnever commented 3 years ago

If b2yt_ngread changes with different SOLPS versions, those 18 if-else statements could end up as a lot more actual branches we'd need to handle.

Actually, it is not as awful as it seemed at first glance. For example, any version of fort.44 before 20071209 has no eneutrad dataset at all. In all other cases, the parser needs to skip a certain number of data points to reach eneutrad dataset. This number of data points is countable in each case.

On a similar note, do the b2fgmtry and b2fstate files vary this much between SOLPS versions (even SOLPS 5 or -ITER versions)? Or are we just unlucky with the Eirene part of the code? We don't seem to have run into the same problems with B2 outputs.

Unlike fort.44, the files b2fgmtry and b2fstate are structured. The datasets stored in these files may depend on SOLPS version and on the options set by the user, but each dataset in these files has a key, so we can always check the presence of a certain dataset by its key.

jacklovell commented 3 years ago

One format we should definitely support is 20081111, as this is the format used for the standalone demo.

vsnever commented 3 years ago

As a temporary solution I can add a parser that parses anything except the eneutrad just based on the fort.44 file version. For now we can use it to parse the files older than 20130210, until the universal parser is ready.

vsnever commented 3 years ago

@jacklovell please see this commit. This parser is based on load_fort44_2013 but skips parsing eneutrad. As eneutrad was introduced in the version 20071209, it can read any older version without missing anything. I propose to use it also for the versions 20071209, 20080706 and 20081111 as a temporary solution. If you agree, I'll make a PR.

jacklovell commented 3 years ago

This looks good, thanks. Though I would call it something like read_fort44_pre2007 as this is a bit more precise than just "old" (after all, at this point a 2013 file format is pretty old and even 2017 will be "old" in several years time). We can avoid the temporary confusion of using a "pre-2007" reader to read 2008 and 2011 data by commenting in the code that we're using this parser until full support is added for the 2008 and 2011 formats.

jacklovell commented 3 years ago

I notice too that 20170328 files which I have contain a title line before each data block, such as the following:

*eirene data field dab2 with size  3456

The field names match up with those in the SOLPS manual https://portal.iter.org/departments/POP/CM/IMAS/SOLPS-ITER/Manuals%20and%20Documentation/SOLPS-ITER_User_Manual.pdf (section 5.2), though some quantities in the latest version of the manual - such as emolrad and eionrad - are missing in the 20170328 fort.44 version and are presumably only available in later versions. I don't have a 20170228 format file to test if the labelling was present there too, though I think it's likely. So this means we have a rough split of <=2013 is unlabelled and >=2017 is labelled.

This brings the file a bit closer to being self describing. We could adopt a similar technique to that used for the B2 files, where the data is read by the parser into a dictionary keyed by the field name. Then the data could be conditionally assigned to the Eirene object, depending on whether or not it existed in the dictionary. This should give good compatibility with a range of the newer fort.44 formats, and more importantly allow us to warn or fail loudly if the data isn't in a format we expect, rather than silently reading garbage.

One drawback is that the title line only gives the size of the flattened data, and contains no information about its shape. So we would still need to independently provide the dimensions of the data being read so it can be reshaped properly.

jacklovell commented 3 years ago

I've also found some fort.44 files on one of our ORNL machines with version number 20160829. This also has a title line and so could be read by this unified parser.

vsnever commented 3 years ago

This looks good, thanks. Though I would call it something like read_fort44_pre2007 as this is a bit more precise than just "old" (after all, at this point a 2013 file format is pretty old and even 2017 will be "old" in several years time). We can avoid the temporary confusion of using a "pre-2007" reader to read 2008 and 2011 data by commenting in the code that we're using this parser until full support is added for the 2008 and 2011 formats.

Yes, you are right. I renamed it and added the comments that using this parser for pre-2013 versions is a temporary solution. I made a PR.

vsnever commented 3 years ago

One drawback is that the title line only gives the size of the flattened data, and contains no information about its shape. So we would still need to independently provide the dimensions of the data being read so it can be reshaped properly.

Yes, and for the total power radiated by neutral atoms ('eneutrad') we cannot tell whether it's resolved over atoms or not until we know SOLPS version, Eirene format (in addition to fort.44 version) and the value of NLWRMSH variable, whatever this variable means).

vsnever commented 3 years ago

Yes, and for the total power radiated by neutral atoms ('eneutrad') we cannot tell whether it's resolved over atoms or not until we know SOLPS version, Eirene format (in addition to fort.44 version) and the value of NLWRMSH variable, whatever this variable means).

However, in case of labeled files we can count the number of data points before the next label, and thus determine whether the 'eneutrad' is resolved over neutrals or not. So, maybe, yes, in case of labeled files we do not need to ask the user about SOLPS version, Eirene format, etc.

jacklovell commented 3 years ago

Yes, and for the total power radiated by neutral atoms ('eneutrad') we cannot tell whether it's resolved over atoms or not until we know SOLPS version, Eirene format (in addition to fort.44 version) and the value of NLWRMSH variable, whatever this variable means).

However, in case of labeled files we can count the number of data points before the next label, and thus determine whether the 'eneutrad' is resolved over neutrals or not. So, maybe, yes, in case of labeled files we do not need to ask the user about SOLPS version, Eirene format, etc.

The label line includes the number of elements in that block. We need to know the dimensions to reshape the block anyway, so with those dimensions and the number of elements we can uniquely determine whether the radiation is resolved over atoms or not. The same will apply to emolrad and eionrad in >=2018.

jacklovell commented 3 years ago

I'm currently working on a routine to read in labelled formats (>=2016), which I'll shortly submit as a PR for testing.

vsnever commented 3 years ago

I'm currently working on a routine to read in labelled formats (>=2016), which I'll shortly submit as a PR for testing.

It's great. Since I more or less understand how b2plot reads this file, I will write a parser for the unlabelled pre-2016 format. Unfortunately, due to the workload, I will not be able to do this until November.

cherab / solps

Make fort44 parser to support more versions of EIRENE fort.44 file #32