ansible / awx

AWX provides a web-based user interface, REST API, and task engine built on top of Ansible. It is one of the upstream projects for Red Hat Ansible Automation Platform.
Other
13.55k stars 3.35k forks source link

Migration fails and leaves AWX install in a bad state #138

Closed allen00se closed 6 years ago

allen00se commented 6 years ago
ISSUE TYPE
COMPONENT NAME
SUMMARY

AWX install seemed to complete without error, however it appears to be hung in the migration phase. The web interface shows AWX is upgrading and tailing the log files shows a traceback error repeatedly.

ENVIRONMENT
STEPS TO REPRODUCE

Install AWX per guide

EXPECTED RESULTS
ACTUAL RESULTS
ADDITIONAL INFORMATION

here is a snippet from the error that keeps repeating in the log file

django.db.utils.ProgrammingError: relation "main_schedule" does not exist LINE 1: ...le"."next_run", "main_schedule"."extra_data" FROM "main_sche...

wenottingham commented 6 years ago

Full error, please.

allen00se commented 6 years ago

https://paste.ee/p/A4zGi

matburt commented 6 years ago

I've been hearing about this for a little while from various folks... is there any way you can run this again and see if it happens again grabbing more of the output the occurred above this as this isn't the totality of the awx_task container log.

Basically... in some situations migrations seem to be failing but no one has been able to show me a log that shows the actual migrations failing at the top of the awx_task container log.

knechtionscoding commented 6 years ago

@matburt This is the same error I encountered yesterday. #116.

@allen00se The solution that worked for me and @phandolin was the following:

  1. Stop each container in descending order:

    • awx_task
    • awx_web
    • memcached
    • rabbitmq
    • postgres
  2. Remove /tmp/pgdocker/

  3. Re-run install.yml with no other changes.

Not sure what the cause was, I couldn't grab the logs yesterday before they were overwritten.

allen00se commented 6 years ago

@matburt

Youre in luck bc I have it from the start...

https://paste.ee/p/i1buQ

allen00se commented 6 years ago

@KnechtionsCoding

Thanks man, that looks to be working, further along than the first try already.

matburt commented 6 years ago

The important thing from @KnechtionsCoding was this bit:

  1. Remove /tmp/pgdocker

Here's the line from you @allen00se that's relevant

django.db.utils.IntegrityError: duplicate key value violates unique constraint "pg_type_typname_nsp_index"
DETAIL:  Key (typname, typnamespace)=(django_migrations_id_seq, 2200) already exists.

Had tried to start it up once before and it failed and then went further the next time?

Maybe two parallel installs?

matburt commented 6 years ago

We need to find a way to make sure this kind of error starting up awx_task can't happen.

knechtionscoding commented 6 years ago

@matburt When I had the issue I had previously run the playbook and had to fix a different error. I didn't even think about that.

Run a check for /tmp/pgdocker? Is that the storage of postgres permanently? Could it be removed on run of script?

Because if it can be removed on script run then the ansible can be a simple as something like this:

- stat:
    path: /tmp/pgdocker/
  register: pgdocker
- {delete /tmp/pgdocker/}
  when: pgdocker.stat.isdir is defined and pgdocker.isdir

If /tmp/pgdocker is permanent storage for the postgres db then this becomes an issue if someone runs the playbook against an already running awx tower (i.e. upgrade to a newer version, etc.).

allen00se commented 6 years ago

@matburt @KnechtionsCoding Same here I had the flag set for the AWX branding logos, but didn't have them in the correct location which caused the install to fail. I put the logos where they belonged and then ran the install again. Im sure at that is what caused the /pgdocker folder to exist already.

phandolin commented 6 years ago

@matburt @KnechtionsCoding @allen00se Yes, I had also run the playbook once, had to stop it and update the server, then ran again and that's when it happened here as well.

matburt commented 6 years ago

We definitely don't want to remove the database directory otherwise everyone would lose their data but this is super helpful to know. Usually these things are protected by transactions to keep from running into partial migrations.

I'll look at this a little closer.

xjohnyknox commented 6 years ago

Thanks, it works!

switchboardOp commented 6 years ago

Having the same or similar issue but using an external postgres host. https://paste.ee/p/TDdd1

AlanCoding commented 6 years ago

Is anyone still hitting this error? I see #116 is a related issue that seems to have resolve itself.

ricalenil commented 6 years ago

Greetings,

I'm having this error when upgrading from 1.0.2.0 to 1.0.2.289

django.db.utils.ProgrammingError: column main_jobtemplate.credential_id does not exist LINE 1: ...urvey_enabled", "main_jobtemplate"."survey_spec", "main_jobt... ^ I'm falling back to 1.0.2.0, but this needs to be corrected/fixed.

matburt commented 6 years ago

@ricalenil sorry, but there's no direct upgrade path from 1.0.2.0 to anything later. See: https://groups.google.com/d/msg/awx-project/PQLxKl5Rj9s/UGy-3VaCCQAJ

mkempster22 commented 6 years ago

Just got this issue again upgrading today, I had this a while back and switched to an external postgres to try and avoid the issue. Have we got a way to fix this yet, completely removing the database is a bit drastic

mkempster22 commented 6 years ago

For info, I'm currently running AWX 1.0.2.327 Running any sort of upgrade forces AWX to get completely stuck in 'AWX is upgrading' with a similar log to switchboardOp

Unfortunately i'm also having the issue of my /var/lib/docker/overlay folder filling up meaning the only way to clean it is to remove containers which will cause them get the latest containers. So I either run out of space and cant launch AWX, or I clean the files and ugrade and then cant launch AWX

thealexauer commented 6 years ago

I Have the same issue, with the following log (latest version) https://paste.ee/p/7irxo

dfollereau commented 6 years ago

I have the same issue with openshift origin 3.7.1 and ansible/latest (1.0)- symptom is web page constantly showing "AWX is Upgrading" and stays in this state forever. address: http://awx-web-svc-awx.192.168.99.100.nip.io/. I dont know which infos to give you to help me finding the issue

thealexauer commented 6 years ago

If I recall correctly get the logs from awx-task/awx-celery. Log should be there

FloThinksPi commented 6 years ago

@matburt i also got this when updating from 1.0.2.0 to 1.0.3.0. You said there is no (automatic) update path from 1.0.2 to anything later. Is there a manual way we can upgrade ? Database export import or something like that ?

joshuacherry commented 6 years ago

@matburt

there's no direct upgrade path from 1.0.2.0 to anything later

There seems to have been a few people in the past couple of days that have expressed problems with upgrading to 1.0.3 which suggests that some people (myself included) assumed there would be a path to upgrade between versions. I think it would be useful to clarify what AWX users can expect in terms of upgrading so this confusion can be avoided.

As mentioned in #1133 by @jakemcdermott

There is no explicit expectation that one should be able to smoothly upgrade from one commit to the next on the devel branch.

akcrisp commented 6 years ago

Guys on doing upgrades. We appreciate its dev but equally I think you have to apply some common sense and realise you've got a lot of people using this software and therefore, it would be doing the community a great service if you can supply some guidelines - even manual about doing upgrades.

I still struggle with the thought that if your breaking dev branch - then you inevitably going to break the ansible tower deployments and therefore someone must surely be looking into what changes are taking place ?

So can somone please document what steps (whether manual) are required to allow updates. Would a db dump and load onto a new fresh db work ? I have the added complexetity of going from 1.0.1.173 at somepoint...

Andy

johnjeffers commented 6 years ago

Agreed with @akcrisp

Please give us something here besides manually recreating everything by hand. Once you get more than a handful of job templates, it's an excruciating task, especially if the jobs include surveys.

FWIW I looked into migrating to a new version via the API, but many things refer to specific IDs in the database. For example, job templates refer to credentials by their ID, which probably won't be the same in the new DB. This makes export/import via the API rather difficult.

I understand this is not a commercial product like Tower, and there is no expectation of support, but other Red Hat open-source products have reasonable upgrade paths.

AlanCoding commented 6 years ago

There is a feature in the works for tower-cli (now also adopting the alias awx-cli) to copy and export data.

https://github.com/ansible/tower-cli/issues/197

Like everything with the CLI project, it is intended to be cross-version compatible. This should solve the problem for some users in some situations once it rolls out.

johnjeffers commented 6 years ago

Another (kind of obvious) benefit of giving your users an upgrade path is that it gets people testing your newer code. If I'm stuck on an old version because I can't upgrade, I'm not doing you much good as a tester.

bobobox commented 6 years ago

Totally agreed - some path to be able to keep an AWX stack reasonably up to date without having to start over each time would be much appreciated, and frankly, seems like a basic requirement for RedHat to be able to say that they're offering a good faith free and non-commercial version of their product.

I've created scripts that use tower-cli (it's a handy Python library in addition to being a cli tool) to export and import my Job Templates and Surveys, as recreating all of these would be a nightmare, but it still results in the loss of Job History, and just isn't a fun time.

I understand that the AWX devs are working hard and that supporting DB upgrades between arbitrary commits is not practial, but clearly the upgrade issue is solved for downstream Tower. Maybe all it would take is to tag certain commits in AWX which 'track' Tower?

matburt commented 6 years ago

We may do more to track upgrade in the future, but currently that's not the case. This is very much an upstream development focused branch for Tower and as we've said before... there's no guarantees.

akcrisp commented 6 years ago

Got to be honest matt I think i and I suspect everyone else was hoping for a more positive message. Ultimately if this continues (even in short term) to be a painful experience people will go else where and use other tools. That will inevitably have a direct impact on people then not purchasing ansible tower.

As others pointed out you will also get less people moving rapidly to the next version if at all and this will directly impact development / bug fixing.

Regards

Andy

Sent from my iPhone

On 14 Feb 2018, at 17:27, Matthew Jones notifications@github.com wrote:

We may do more to track upgrade in the future, but currently that's not the case. This is very much an upstream development focused branch for Tower and as we've said before... there's no guarantees.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

ricalenil commented 6 years ago

@matburt this kind of answers takes away credibility to this project. I think this community is helping a lot to tower development with an invaluable job. The least I expect of you is a little respect cause we are trusting in you and that you are doing a quality job. Imagine that Fedora tells the community that they are not going to support upgrading any more? What do you think is gonna happen? I understand that there’s no warranty, but upgrading is a very basic thing that any serious software may have!

Sent with GitHawk

gregdek commented 6 years ago

@ricalenil as a former Fedora Project Leader, I can tell you the exact answer: for the first several versions of Fedora, we did not recommend upgrades in place, because they would break users in unsupportable ways.

AWX is young now, just as Fedora was young then. We are not saying "we will never support upgrades in place for AWX." We are saying "we do not currently support upgrades in place for AWX because we have other issues that are a higher priority." For now, that answer is firm.

akcrisp commented 6 years ago

@gregdek I think however although awx might be young as an open source project - ansible tower (licensed version) is not. This is the up stream version of that product. So if you are breaking and preventing awx upgrades it is a logical assumption you will also be breaking the down stream ansible tower upgrades. In order for that not to happen - seeing as people are paying for that support - someone must be tracking those changes and planning remediation ? If so why can’t that be made available to the awx community ?

Regards

Andy

mkempster22 commented 6 years ago

Unfortunately I don't think it would be as easy as that. I don't see any way it could be kept to a point where it is upgrade-able without extra resources dedicated to that which it's clear is not going to happen. Although I do agree that the current way AWX is developed is going to turn people away from using it, such as when over December there was an issue where no one could run any job templates at all for weeks (this sort of thing shouldn't really get pushed without any form of testing?)

The best way to use AWX at the moment is download a version that is known to be working and stick with it. Use this to see if you would get the benefit of tower or not. The only reason to go around upgrading is if you are using AWX with the purpose of contributing to it and are not actually planning on using it as an alternative to tower

As for a open source alternative to tower. there doesn't seem to be a good option for that at the moment

shanemcd commented 6 years ago

it is a logical assumption you will also be breaking the down stream ansible tower upgrades

This is an incorrect assumption. Tower upgrade paths are tested and supported. It should be completely understandable that there will be changes to the data models in the development branch between major releases that might cause issues when pulling new code. You should automate the provisioning of your AWX server rather than relying on a stable upgrade path.

akcrisp commented 6 years ago

Sorry but if tower is based on awx then it follows that those same data changes between awx major releases will have an impact on tower. How else will new features be implemented if not ? So therefore someone has to be tracking those changes in order to test and upgrade tower.

I can’t see there’s any way around that.

Automating the provisioning of the awx server is not the problem - getting existing data from the previous release into the new release is. I merely would like to see a detailed change log so we know what’s changed / gonna break and ideally a method for data migration.

I have to be honest from what I am hearing this seems to be frankly a very cavalier approach to caring about the community and trying to nurture it in return for us helping improve the product which inevitably will provide profit to red hat in features and stability in the down stream release.

Sent from my iPhone

On 15 Feb 2018, at 20:26, Shane McDonald notifications@github.com wrote:

it is a logical assumption you will also be breaking the down stream ansible tower upgrades

This is an incorrect assumption. Tower upgrade paths are tested and supported. It should be completely understandable that there will be changes to the data models in the development branch between major releases that might cause issues when pulling new code. You should automate the provisioning of your AWX server rather than relying on a stable upgrade path.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

FloThinksPi commented 6 years ago

So IMHO the issue can be closed as wontfix, the reason is understandable and answered above.

The focus atm is Dev and once this project comes near a tower release, there will be efforts in developing a upgrade path between the new and old tower/awx versions as @shanemcd pointed out. This will repeat etc.

It Makes totally sense not to waste time in supplying update paths between tiny version jumps that are anyway just in rapid change and dev. So we will see over time in which direction this goes and if update paths get released at some point, or at least rugh migration instructions etc. 😉

matburt commented 6 years ago

I have closed this issue, the original problem might still be there if you interrupt the install when it's performing the first database migration.

AlanCoding commented 6 years ago

The tower-cli feature to receive/upload data has been submitted:

https://github.com/ansible/tower-cli/pull/479

I mentioned this in the mailing list, but I thought some people might get the notification from this issue.

MrMEEE commented 6 years ago

As far as I see.. the tower-cli is not really an option (not alone anyways).. as it's missing to many thing... (credentials, logs, group, ldap and more)..

I have noticed that there is:

awx-manage dumpdata awx-manage loaddata

I can get the dumpdata, to dump the data to a file, but I can't get loaddata to work...

Anyone know how to load the data back in??? and if it can be used when upgrading???

akcrisp commented 6 years ago

You can just do a curl command to get the ldap config - I have had it work from 1.0.1 to 1.0.5.32 just be very careful with ldap bind password as I just found it wipes out what’s in the dB with the $encrypted :-)

Sent from my iPhone

On 1 May 2018, at 13:17, Martin Juhl notifications@github.com wrote:

As far as I see.. the tower-cli is not really an option (not alone anyways).. as it's missing to many thing... (credentials, logs, group, ldap and more)..

I have noticed that there is:

awx-manage dumpdata awx-manage loaddata

I can get the dumpdata, to dump the data to a file, but I can't get loaddata to work...

Anyone know how to load the data back in??? and if it can be used when upgrading???

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

akcrisp commented 6 years ago

I do think it would be extremely useful to the community if the guys developing awx / tower-cli / awx-manage, could provide some proper guidance on how to extract the various elements to help the community help them by enabling community moving too newer versions even when direct upgrades have been broken by changes in the major releases.

Andy

Sent from my iPhone

On 1 May 2018, at 13:17, Martin Juhl notifications@github.com wrote:

As far as I see.. the tower-cli is not really an option (not alone anyways).. as it's missing to many thing... (credentials, logs, group, ldap and more)..

I have noticed that there is:

awx-manage dumpdata awx-manage loaddata

I can get the dumpdata, to dump the data to a file, but I can't get loaddata to work...

Anyone know how to load the data back in??? and if it can be used when upgrading???

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

gregdek commented 6 years ago

The tower-cli changes are still relatively new, just released a few days ago, so still working on documenting this procedure. It's coming.