clearlinux / swupd-server

Software update server (deprecated)
Other
13 stars 17 forks source link

Do not prepend empty items #22

Closed igor-stoppa closed 8 years ago

igor-stoppa commented 8 years ago

When the item is not found, do not add it to the list. It will cause segfaults later on, where the code dereferences the pointer under the assumption that it is not NULL.

Signed-off-by: Igor Stoppa igor.stoppa@intel.com

igor-stoppa commented 8 years ago

In Ostro-OS we are transitioning from an old version of swupd-server to the latest of the master branch. To avoid discontinuity, we are still using format 3. While testing the latest code, I got a segfault during the creation of the updates for a bundle that was not affected by the update. This seems to fix the issue, however I wasn't able to determine if it's the real rootcause.

Btw, is there regression testing for older formats? I understand (correctly?) that now CL is primarily on format 4.

bryteise commented 8 years ago

I'm not sure if this change could allow some assumptions about includes to be violated but it is safe by itself (though maybe should be an assert). The reason being that every bundle should have changed for the includes addition because os-core should autoinclude into all bundles (and os-core itself is updated each version).

The server side change is also a breaking change requiring a format bump unless you don't use includes at all when building your chroots otherwise client dependencies will not be installed if the client doesn't understand the manifest includes.

What do you mean by regression testing for older formats? The server code doesn't really care what format it is on, just the client and server must agree on format and must transition to new formats when a change to one will cause incorrect behavior to the other.

igor-stoppa commented 8 years ago

By regression testing for older formats, I mean that if something is not tested, it cannot be assumed to work. Here I'm just speculating that if we use a format that is not any more the preferred one, we might incur in regressions, if the format is not used for testing new submissions.

bryteise commented 8 years ago

Well the format number doesn't matter much you could increment faster for instance but when the server+client make a breaking change you will hit issues if you don't bump your format and make the server+client version change in lockstep. That's pretty much the point of format changes after all.

igor-stoppa commented 8 years ago

The change cannot be an assert, because we do meet such case. I do not know what is the cause, but we do get list elements without data. This is merely preventing the empty element from making it into the list. There would be also several other places to clean up, where these pointers are de-referenced without testing if they are actually non-zero. But I wasn't sure if that was by design or not.

igor-stoppa commented 8 years ago

we do follow the same format on both client and server, only we do not use the latest format

On 19 May 2016 at 19:58, William Douglas notifications@github.com wrote:

Well the format number doesn't matter much you could increment faster for instance but when the server+client make a breaking change you will hit issues if you don't bump your format and make the server+client version change in lockstep. That's pretty much the point of format changes after all.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/clearlinux/swupd-server/pull/22#issuecomment-220387099

igor

bryteise commented 8 years ago

@igor-stoppa @incandescant I'd be curious to verify a few things with asserts because this has been hard for me to reproduce (I really need a pre-includes set of chroots and then try to run those against a new swupd-server in this case, which shows a testing gap of server format bump testing that we need to address @phmccarty).

Would you consider adding in asserts on src/create_update.c:318 and 319 for new_core and old_core ->includes being not NULL "assert(!new_core->includes)". Similarly on src/create_udpate.c:386 and 395, assert that the manifest->includes are not NULL "assert(!manifest->includes)" and probably a print saying which bundle it is. I was unable to see a code path in the current version where the includes pointer would be not NULL on the first run of create update after updating swupd server.

Based on this patch though sub_manifest_from_directory is doing something wrong but I couldn't figure out what yet based on my searching this morning. My best guess is in analyze_fs.c's get_sub_manifest_includes somehow is not returning NULL which makes absolutely zero sense for me but may lead you somewhere helpful in your own debugging.

incandescant commented 8 years ago

Apologies for the multiple edits, I shouldn't have tackled this one first thing before coffee :-/

Running swupd_create_update for v1 of the OS succeeds, with some changes made to the OS I try to run swupd_create_update for v2 and the assertion triggers when processing the first non os-core bundle.

The bundle has a single include, the os-core bundle.

I'm calling:

swupd_create_update -S /srv/builds/oecore/tmp-glibc/deploy/swupd/qemux86/myhouse-image-core --osversion 2 --format 3

and the output around the time of the assert is:

Entering phase 3: The bundles
Processing bundle developer
swupd_create_update: ../git/src/create_update.c:401: main: Assertion `!manifest->includes' failed.

Program received signal SIGABRT, Aborted.
incandescant commented 8 years ago

I've been digging into this some more this morning, the phase 3 processing loops we load all of the manifests listed in the groups.ini file into hash tables, new manifests from the directory and old manifests from the files on disk.

We then iterate the manifests and expand any named includes, this is where we run into the segfault. When processing the first bundle there's an include for the os-core bundle (the default behaviour), however the groups.ini file we generate in meta-swupd doesn't list os-core. Thus when we do the name_includes loop over the old_manifests the g_hash_table_lookup(old_manifests, "os-core") call returns NULL and we then insert that into the manifest->includes list.

At this point it's not clear to me whether swupd expects os-core to be explicitly listed in groups.ini? Or whether there should be some extra logic to do the right thing when the os-core named include is found?

FWIW: not listing os-core in groups.ini hasn't caused any issues up to this point.

incandescant commented 8 years ago

Just to follow through on my theory above I tried explicitly listing os-core in groups.ini and with that change my swupd_create_update -S /srv/builds/oecore/tmp-glibc/deploy/swupd/qemux86/myhouse-image-core --osversion 2 --format 3 invocation completed successfully.

Would be great to confirm whether that's required or whether it's papering over a bug?

bryteise commented 8 years ago

@incandescant That's actually really interesting not having os-core in the groups.ini file works as much as it does. It definitely isn't something I was aware of being a valid use and I have always worked under the assumption it exists in the groups.ini file.

incandescant commented 8 years ago

@bryteise thanks, I've updated meta-swupd to add os-core to groups.ini. I guess we can close this PR.

phmccarty commented 8 years ago

Thanks for root causing the issue!

After some discussion with @bryteise, in the near term, I think it would be better for swupd_create_update to enforce the requirement of an "os-core" bundle; if "os-core" does not exist in groups.ini, then the update should gracefully exit and print a useful error message.

Longer term, I would like to remove the "os-core" bundle requirement, at minimum make the name configurable and document the purpose of this special bundle, namely that it should contain files that all other bundles include.

incandescant commented 8 years ago

That all sounds good. Documentation especially, but the graceful error will certainly help folks trying to adopt swupd for their OS.

igor-stoppa commented 8 years ago

Closing this workaround PR.