BOINC / boinc

Open-source software for volunteer computing and grid computing.
https://boinc.berkeley.edu
GNU Lesser General Public License v3.0
1.97k stars 439 forks source link

Relax XML parser to handle global preferences gracefully #5682

Closed brevilo closed 3 weeks ago

brevilo commented 1 month ago

The global preferences XML gets processed by and transferred between all clients, projects and account managers used by a volunteer. Any part of this diverse software ecosystem could have introduced certain features into the XML structure over time that the current XML parser doesn't expect, resulting in broken parsing. This has been a known and recurring issue for years and previous attempts to fix the situation have apparently been incomplete. Thus we should relax the BOINC XML parser(s) as much as possible such that it handles incoming XML as gracefully as possible.

For instance, this line implies that any elements following the (current) closing venue element aren't parsed. But the reality is that some XML fragments have a different order, for whatever reason. These fragments are technically valid XML and they also contain valid preferences sets, default and one or more venues - just that the venues might sometimes be listed first, or the mod_time follows the venue elements. Please relax the parser to make it more robust for cases like these. For this case I suggest replacing the return 0 by a continue to just keep processing.

I haven't put this into a PR since I'm not familiar enough with this part of the codebase and maybe there are other places that need to be taken into account as well. I hope the general idea came across, though.

Thanks

brevilo commented 1 month ago

Ideally, BOINC would help correcting any flawed XML out there such that the situation will improve over time. For example, the scheduler could try and heal incoming XML by trying to make sure (again, gracefully) that the received XML is, for instance, reordered if needed. This way the fragment stored in the local DB and/or returned to the client (and in turn propagated to other projects) is in the expected format. Alongside this I strongly recommend to define a proper XML schema that codifies the canonical/expected XML structure and allows for its validation.

brevilo commented 1 month ago

For reference: validating the global preferences from all our users WHERE expavg_credit > 0.1 AND global_prefs IS NOT NULL AND global_prefs != '' resulted in:

That shows that most invalid XML fragments could be still be parsed if the suggested change above was implemented.

brevilo commented 1 month ago

Please note that the trivial one-line change (in my first post) isn't enough since one has to take into account repeated reads of the same tag. One should probably use the mask as part of the parsing condition.

This is yet another case where using a proper XML library would make things so much easier and less error-prone. Simply use XPath and be done with it.

brevilo commented 1 month ago

I added the PR above (still WIP!) to facilitate discussion...

AenBleidd commented 1 month ago

I'd personally switch to external xml parsing library (libxml2 for example) to avoid this unneeded complexity. Might be a good topic for the next release.

brevilo commented 1 month ago

Well, there have been multiple attempts from the community (see #1470 or #1633 for instance) over the years to get BOINC to switch to a proper XML library like libxml2 (or similar). Heck, we've been discussing this time and time again since 2008.

So far our opinions haven't been shared, so I'd be more than happy to finally see this happen!

bema-aei commented 1 month ago

What would happen if you just remove the in_venue return 0, i.e. change

diff --git a/lib/prefs.cpp b/lib/prefs.cpp
index 50c80c85eb..430d10e310 100644
--- a/lib/prefs.cpp
+++ b/lib/prefs.cpp
@@ -345,9 +345,7 @@ int GLOBAL_PREFS::parse_override(
         }
         if (in_venue) {
             if (!strcmp(tag, "/venue")) {
-                if (in_correct_venue) {
-                    return 0;
-                } else {
+                if (!in_correct_venue) {
                     in_venue = false;
                     continue;
                 }

If I understand the code correctly, it should just skip the remaining venues (with "wrong" names) as the "wrong" ones before and continue parsing global preferences after the venues.

The only requirement then would be that the preference that are possibly overridden by "venues" have to occur before the venues, but other global preferences shouldn't matter where these occur.

Would that help?

brevilo commented 1 month ago

The only requirement then would be that the preference that are possibly overridden by "venues" have to occur before the venues

Well, that's most of them. That requirement is the main issue here. We even see prefs sets where the venue(s) are listed first.

If you improve the existing parser, you can as well just make it venue-position-agnostic and thereby have it as robust as the real data needs it to be. Also, the proposed PR isn't that ugly, given the context, is it?

davidpanderson commented 3 weeks ago

Please send me an example of prefs XML that you think isn't being interpreted correctly, and a description of how you think it should be interpreted.

brevilo commented 3 weeks ago

See above, just put anything after the venue(s), like so:

<global_preferences>
  <venue name="home">
    <max_ncpus_pct>25</max_ncpus_pct>
    <cpu_usage_limit>100</cpu_usage_limit>
    <run_on_batteries>0</run_on_batteries>
    <run_if_user_active>1</run_if_user_active>
    <run_gpu_if_user_active>0</run_gpu_if_user_active>
    <idle_time_to_run>3</idle_time_to_run>
    <suspend_if_no_recent_input>0</suspend_if_no_recent_input>
    <suspend_cpu_usage>60</suspend_cpu_usage>
    <work_buf_min_days>0.01</work_buf_min_days>
    <work_buf_additional_days>0</work_buf_additional_days>
    <cpu_scheduling_period_minutes>60</cpu_scheduling_period_minutes>
    <disk_interval>60</disk_interval>
    <disk_max_used_gb>0.6</disk_max_used_gb>
    <disk_min_free_gb>1</disk_min_free_gb>
    <disk_max_used_pct>50</disk_max_used_pct>
    <ram_max_used_busy_pct>50</ram_max_used_busy_pct>
    <ram_max_used_idle_pct>90</ram_max_used_idle_pct>
    <leave_apps_in_memory>1</leave_apps_in_memory>
    <vm_max_used_pct>75</vm_max_used_pct>
    <max_bytes_sec_down>0</max_bytes_sec_down>
    <max_bytes_sec_up>0</max_bytes_sec_up>
    <daily_xfer_limit_mb>0</daily_xfer_limit_mb>
    <daily_xfer_period_days>0</daily_xfer_period_days>
    <dont_verify_images>0</dont_verify_images>
    <confirm_before_connecting>0</confirm_before_connecting>
    <hangup_if_dialed>0</hangup_if_dialed>
  </venue>
  <venue name="work">
    <max_ncpus_pct>100</max_ncpus_pct>
    <cpu_usage_limit>100</cpu_usage_limit>
    <run_on_batteries>0</run_on_batteries>
    <run_if_user_active>1</run_if_user_active>
    <run_gpu_if_user_active>1</run_gpu_if_user_active>
    <idle_time_to_run>2</idle_time_to_run>
    <suspend_if_no_recent_input>0</suspend_if_no_recent_input>
    <suspend_cpu_usage>0</suspend_cpu_usage>
    <start_hour>0</start_hour>
    <end_hour>0</end_hour>
    <work_buf_min_days>1.5</work_buf_min_days>
    <work_buf_additional_days>0</work_buf_additional_days>
    <cpu_scheduling_period_minutes>60</cpu_scheduling_period_minutes>
    <disk_interval>120</disk_interval>
    <disk_max_used_gb>20</disk_max_used_gb>
    <disk_min_free_gb>2</disk_min_free_gb>
    <disk_max_used_pct>60</disk_max_used_pct>
    <ram_max_used_busy_pct>60</ram_max_used_busy_pct>
    <ram_max_used_idle_pct>60</ram_max_used_idle_pct>
    <leave_apps_in_memory>1</leave_apps_in_memory>
    <vm_max_used_pct>75</vm_max_used_pct>
    <max_bytes_sec_down>0</max_bytes_sec_down>
    <max_bytes_sec_up>0</max_bytes_sec_up>
    <daily_xfer_limit_mb>0</daily_xfer_limit_mb>
    <daily_xfer_period_days>0</daily_xfer_period_days>
    <dont_verify_images>0</dont_verify_images>
    <confirm_before_connecting>0</confirm_before_connecting>
    <hangup_if_dialed>0</hangup_if_dialed>
    <net_start_hour>11</net_start_hour>
    <net_end_hour>12</net_end_hour>
    <preset>custom</preset>
  </venue>
  <venue name="school">
    <max_ncpus_pct>50</max_ncpus_pct>
    <cpu_usage_limit>100</cpu_usage_limit>
    <run_on_batteries>0</run_on_batteries>
    <run_if_user_active>1</run_if_user_active>
    <run_gpu_if_user_active>0</run_gpu_if_user_active>
    <idle_time_to_run>3</idle_time_to_run>
    <suspend_if_no_recent_input>0</suspend_if_no_recent_input>
    <suspend_cpu_usage>70</suspend_cpu_usage>
    <work_buf_min_days>0.1</work_buf_min_days>
    <work_buf_additional_days>0.0</work_buf_additional_days>
    <cpu_scheduling_period_minutes>60</cpu_scheduling_period_minutes>
    <disk_interval>60</disk_interval>
    <disk_max_used_gb>8</disk_max_used_gb>
    <disk_min_free_gb>4</disk_min_free_gb>
    <disk_max_used_pct>20</disk_max_used_pct>
    <ram_max_used_busy_pct>50</ram_max_used_busy_pct>
    <ram_max_used_idle_pct>50</ram_max_used_idle_pct>
    <leave_apps_in_memory>1</leave_apps_in_memory>
    <vm_max_used_pct>75</vm_max_used_pct>
    <max_bytes_sec_down>0</max_bytes_sec_down>
    <max_bytes_sec_up>0</max_bytes_sec_up>
    <daily_xfer_limit_mb>0</daily_xfer_limit_mb>
    <daily_xfer_period_days>0</daily_xfer_period_days>
    <dont_verify_images>0</dont_verify_images>
    <confirm_before_connecting>0</confirm_before_connecting>
    <hangup_if_dialed>0</hangup_if_dialed>
    <start_hour>0</start_hour>
    <end_hour>0</end_hour>
    <net_start_hour>0</net_start_hour>
    <net_end_hour>0</net_end_hour>
    <preset>custom</preset>
  </venue>
  <max_ncpus_pct>100</max_ncpus_pct>
  <cpu_usage_limit>100</cpu_usage_limit>
  <run_on_batteries>0</run_on_batteries>
  <run_if_user_active>1</run_if_user_active>
  <run_gpu_if_user_active>1</run_gpu_if_user_active>
  <idle_time_to_run>3</idle_time_to_run>
  <suspend_if_no_recent_input>0</suspend_if_no_recent_input>
  <suspend_cpu_usage>25</suspend_cpu_usage>
  <work_buf_min_days>1</work_buf_min_days>
  <work_buf_additional_days>0</work_buf_additional_days>
  <cpu_scheduling_period_minutes>15</cpu_scheduling_period_minutes>
  <disk_interval>300</disk_interval>
  <disk_max_used_gb>1</disk_max_used_gb>
  <disk_min_free_gb>0.01</disk_min_free_gb>
  <disk_max_used_pct>50</disk_max_used_pct>
  <ram_max_used_busy_pct>50</ram_max_used_busy_pct>
  <ram_max_used_idle_pct>90</ram_max_used_idle_pct>
  <leave_apps_in_memory>0</leave_apps_in_memory>
  <vm_max_used_pct>75</vm_max_used_pct>
  <max_bytes_sec_down>0</max_bytes_sec_down>
  <max_bytes_sec_up>0</max_bytes_sec_up>
  <daily_xfer_limit_mb>0</daily_xfer_limit_mb>
  <daily_xfer_period_days>0</daily_xfer_period_days>
  <net_start_hour>10</net_start_hour>
  <net_end_hour>23</net_end_hour>
  <dont_verify_images>0</dont_verify_images>
  <confirm_before_connecting>0</confirm_before_connecting>
  <hangup_if_dialed>0</hangup_if_dialed>
  <start_hour>0</start_hour>
  <end_hour>0</end_hour>
  <preset>green</preset>
  <mod_time>1709547315</mod_time>
</global_preferences>

Generally speaking, don't depend on the (per-level) order.

As mentioned above, you could really save yourself everyone these recurring troubles by processing XML using a tried and tested library. Here this would enable the use of XPath. You wouldn't do line-by-line stateful parsing (SAX-like) but rather query each value directly. For example, you want to retrieve the value of disk_interval for the work-venue? Easy, use this expression:

/global_preferences/venue[@name='work']/disk_interval/text()

You can try yourself using xmllint for instance:

xmllint --xpath "<XPATH-EXPRESSION>" <FILE>

In BOINC you could use libxml2 to implement that. And it's safer, too...

davidpanderson commented 3 weeks ago

So the problem is that mod_time (and source_project and source_scheduler) aren't parsed when they appear after a , and that is the selected (target) venue. Easy to fix.

In general it would work better if you simply tell me exactly what problem you're seeing.

brevilo commented 3 weeks ago

So the problem is that mod_time (and source_project and source_scheduler) aren't parsed when they appear after a , and that is the selected (target) venue.

That's probably right, if those three are the only ones among the default prefs that aren't overridden by venue-specific counterparts.

In general it would work better if you simply tell me exactly what problem you're seeing.

Seriously!? That's what I did ~~in the second paragraph of my OP. in all of my posts, in prose and in code.