JeffersonLab / chroma

The Chroma Software System for Lattice QCD
http://jeffersonlab.github.io/chroma
Other
58 stars 50 forks source link

XML files are not properly closed when running into a timeout #20

Closed martin-ueding closed 7 years ago

martin-ueding commented 7 years ago

I occasionally face the problem that I specified too many updates for a single invocation of hmc and therefore run into the six hour walltime limit on JURECA. The last block started in the output XML might then look like this:

      <elem>
        <Update>
          <update_no>193</update_no>
          <WarmUpP>false</WarmUpP>
          <HMCTrajectory>
            <WarmUpP>false</WarmUpP>
            <H_old>
              <KE_old>5010.4276207191</KE_old>
              <PE_old>-43007097.8619257</PE_old>
            </H_old>
            <H_new>
              <KE_new>5791.19429249987</KE_new>
              <PE_new>-43007878.5434836</PE_new>
            </H_new>
            <deltaKE>780.766671780766</deltaKE>
            <deltaPE>-780.681557863951</deltaPE>
            <deltaH>0.0851139168153168</deltaH>
            <AccProb>0.918407656366748</AccProb>
            <AcceptP>true</AcceptP>
          </HMCTrajectory>
          <seconds_for_trajectory>721.725699</seconds_for_trajectory>
          <InlineObservables>
            <elem>
              <Plaquette>
                <update_no>193</update_no>
                <w_plaq>0.441936165724557</w_plaq>
                <s_plaq>0.441920568668605</s_plaq>
                <t_plaq>0.441951762780509</t_plaq>
                <plane_01_plaq>0.441707477639388</plane_01_plaq>
                <plane_02_plaq>0.441705823111355</plane_02_plaq>
                <plane_12_plaq>0.442348405255071</plane_12_plaq>
                <plane_03_plaq>0.442113163975531</plane_03_plaq>
                <plane_13_plaq>0.441967064555441</plane_13_plaq>
                <plane_23_plaq>0.441775059810555</plane_23_plaq>
                <link>-3.00385949850094e-05</link>
              </Plaquette>
            </elem>
            <elem>
              <PolyakovLoop>
                <update_no>193</update_no>

The file is just truncated there. Reading it with an XML parser will not work. There are options to my XML library (lxml which somewhere deep down uses libxml2) to still parse this file.

I thought it would be a better user experience if the XML files would be valid even when the job scheduler sends a termination signal. This could be achieved by letting push create some object with calls pop in the destructor. That would of course mean that the whole could would have to be converted from push(...) to XmlPush foo(...) or by converting push to a macro that generates a unique object each time (using the TOKENPASE from http://stackoverflow.com/a/1597129):

#define TOKENPASTE(x, y) x ## y
#define TOKENPASTE2(x, y) TOKENPASTE(x, y)
#define push(...) (XmlPush TOKENPASTE2(xml_push_, __LINE__)(...))

I'm just going to write a little script that will close those XML files for now.

This seems like a rather minor feature with virtually the whole codebase being touched. Is there any interest in this? Is that even sensible?

bjoo commented 7 years ago

Hi Martin, This is my biggest gripe against XML for logging. I hardly ever use XML parsing tools for this reason. If I were to rewrite, I would not use XML for logging purposes. Usually I do most of my analysis on the stdout.

Best, B

On Feb 6, 2017, at 5:08 AM, Martin Ueding notifications@github.com wrote:

I occasionally face the problem that I specified too many updates for a single invocation of hmc and therefore run into the six hour walltime limit on JURECA. The last block started in the output XML might then look like this:

  <elem

< Update

< update_no>193</update_no

< WarmUpP>false</WarmUpP

< HMCTrajectory

< WarmUpP>false</WarmUpP

< H_old

< KE_old>5010.4276207191</KE_old

< PE_old>-43007097.8619257</PE_old

</ H_old

< H_new

< KE_new>5791.19429249987</KE_new

< PE_new>-43007878.5434836</PE_new

</ H_new

< deltaKE>780.766671780766</deltaKE

< deltaPE>-780.681557863951</deltaPE

< deltaH>0.0851139168153168</deltaH

< AccProb>0.918407656366748</AccProb

< AcceptP>true</AcceptP

</ HMCTrajectory

< seconds_for_trajectory>721.725699</seconds_for_trajectory

< InlineObservables

< elem

< Plaquette

< update_no>193</update_no

< w_plaq>0.441936165724557</w_plaq

< s_plaq>0.441920568668605</s_plaq

< t_plaq>0.441951762780509</t_plaq

< plane_01_plaq>0.441707477639388</plane_01_plaq

< plane_02_plaq>0.441705823111355</plane_02_plaq

< plane_12_plaq>0.442348405255071</plane_12_plaq

< plane_03_plaq>0.442113163975531</plane_03_plaq

< plane_13_plaq>0.441967064555441</plane_13_plaq

< plane_23_plaq>0.441775059810555</plane_23_plaq

< link>-3.00385949850094e-05</link

</ Plaquette

</ elem

< elem

< PolyakovLoop

< update_no>193 The file is just truncated there. Reading it with an XML parser will not work. There are options to my XML library (lxml which somewhere deep down uses libxml2) to still parse this file.

I thought it would be a better user experience if the XML files would be valid even when the job scheduler sends a termination signal. This could be achieved by letting push create some object with calls pop in the destructor. That would of course mean that the whole could would have to be converted from push(...) to XmlPush foo(...) or by converting push to a macro that generates a unique object each time (using the TOKENPASE from http://stackoverflow.com/a/1597129):

define TOKENPASTE(x, y

) x ## y # define TOKENPASTE2(x, y ) TOKENPASTE(x, y) # define push(...) (XmlPush TOKENPASTE2(xmlpush, LINE)(...)) I'm just going to write a little script that will close those XML files for now.

This seems like a rather minor feature with virtually the whole codebase being touched. Is there any interest in this? Is that even sensible?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.


Dr Balint Joo High Performance Computational Scientist Jefferson Lab 12000 Jefferson Ave, Suite 3, MS 12B2, Room F217, Newport News, VA 23606, USA Tel: +1-757-269-5339, Fax: +1-757-269-5427 email: bjoo@jlab.org

martin-ueding commented 7 years ago

At first I was sceptic about the XML, but with an XPath supporting library it is really nice to parse, way easier than the more-or-less unstructured flat text output. I had tried a full parsing library on the text output but it was way too slow to be useful. Now I have some script with regular expressions which works, but since the XML really has structure in it, I prefer it by now.