Pinafore / qb

QANTA Quiz Bowl AI
MIT License
167 stars 50 forks source link

Make a Better Installation Guide #68

Closed widavies closed 6 years ago

widavies commented 6 years ago

I've spent a day trying to figure this out and haven't even been able to succeed getting terraform to execute. Please make a better setup guide.

EntilZha commented 6 years ago

We would love to improve our readme/setup instructions! If you could write more about the issues you are running into then it would be easier to both help you get the system running as well as update/improve the setup guide for the future.

Could you post an explanation of what commands you have run and the error messages/program logs?

widavies commented 6 years ago

Yeah! Thanks. Essentially I'm just trying to set up QANTA using Ubuntu Linux as the local setup and with AWS of course as cloud. The dependencies you have listed (Terraform and Packer) didn't install from the links, so I had to find command line alternatives online. Then, I cloned this repository locally and ran

"sudo terraform apply"

I get these errors: 1) s3_instance_type was not found 2) ami was not found

I didn't change anything from the repository you specified.

So if you could make a tutorial guide with "line by line" instructions for how to setup QANTA, that would make it so much easier. Thanks! Also, some of your links in the readme don't go to a section, like "Run AWS scripts" for example.

EntilZha commented 6 years ago

Thanks for more info, it helps a lot.

Terraform/packer are binaries so you can download them and put them in your PATH. Depending on your distro, they may also be available via your package manager (eg I know Arch Linux has them via pacman).

You shouldn't need to run terraform apply with sudo. This might be a source of problems since sudo wipes out your user environment variables.

Since I just clean installed ubuntu on my laptop I can see if I run into the same errors you have, but let me know if running without sudo makes a difference.

For the tutorial guide, we/I had hoped that the readme could serve as both information and a tutorial guide.

widavies commented 6 years ago

Yeah! The readme isn't bad, but just assume less and assume the user is stupider (like me). Make the commands and process a bit more logistical, add step numbers, commands to run, etc.

Just a little tweaking could make it so much more readable. Thanks!

EntilZha commented 6 years ago

After downloading the terraform/packer binaries and adding it to my path, I also set these environment variables and was able to get reasonable output from terraform plan.

AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY TF_VAR_key_pair TF_VAR_access_key TF_VAR_secret_key

Do you get something similar?

widavies commented 6 years ago

I was busy most of today. I'll try it out tomorrow.

widavies commented 6 years ago

Alright, here's the order I did things in:

I get the following output from my console: https://imgur.com/a/OplZ7

Any thoughts?

EntilZha commented 6 years ago

So first thing, you should not be using sudo on all your commands. The reason you got the first error message is because you ran clone/init/apply with sudo which meants that root owns those files, so your regular user can't modify them. You should be able to use chown with the appropriate flag to make it recursive to have your user own the files in qb instead of root. sudo should only be necessary for things like apt install.

Second, you should put the environment variable exports in ~/.bashrc or a similar file so that they get automatically set on new terminals instead of having to type it in every time.

Onto the error on iam profile, that would be my fault. IAM profiles in EC2 are basically ways to say that anyone with a profile named X (in this case s3-full-access) will have all permissions assigned to that role. In this case, that refers to giving anything we launch full access to our private and public S3 storage. The easiest thing to do to fix that is simply remove these lines:

data "aws_iam_instance_profile" "s3_instance_profile" {
  name = "s3-full-access"
}
iam_instance_profile = "${data.aws_iam_instance_profile.s3_instance_profile.name}"

I'll look into seeing if there is an easy to for us to still use automatic profiles without breaking the terraform build for others.

For the AMI issue, I'm not certain why that is occurring since I verified that our images are public. Could you run the following command and paste your output. This assumes you have the AWS CLI installed (python package awscli)

aws ec2 describe-images --filters Name=tag-key,Values=Image Name=tag-value,Values=qanta-cpu

EntilZha commented 6 years ago

Last thing, I would strongly recommend running terraform plan, it will output what terraform apply will do, without executing anything. Once you are sure it looks correct, then do terraform apply

EntilZha commented 6 years ago

Last thing (again), for the key pair, that should be the name of the key pair from the AWS console, not the path to the key pair. Its probably the file name without .pem

widavies commented 6 years ago

Thanks for the help.

I did all the things you requested and most of the errors disappeared. The AMI search still isn't found yet though. Here's the output: https://imgur.com/a/6FWpL

I'll get the environment variables in the ./bashrc so I don't have to keep entering them.

Thanks!

EntilZha commented 6 years ago

Hmm, everything on my end says that it is public and accessible. Could you verify that you get output similar to this (checking aws region):

$ ls ~/.aws
config  credentials
entilzha on lothal at 02:08 in /home/entilzha/
$ cat ~/.aws/config 
[default]
region = us-west-2

Is there anything else out the normal that might be causing permissions issues that you can think of that would be different from using a clean aws account?

widavies commented 6 years ago

I don't get entilzha on lothal at 02:08 in /home/entilzha/ or anything similar in the output.

I'm not sure. My AWS account isn't actually fresh, I've had it for a while and use it to host some other projects. Maybe it's something to do with the AWS cli? Does that need to be configured?

widavies commented 6 years ago

I tried to do a search for your AMI on EC2 and couldn't find it https://imgur.com/a/n5mgS

EntilZha commented 6 years ago

The line you mentioned is since I have a custom bash prompt that displays my username, hostname, and current directory. Could you verify that ~/.aws/config contains region = us-west-2?

The CLI should be representative of what terraform sees, and the fact that it doesn't show up in the web based aws console confirms that for some reason you can't see that image. Assuming your region is set to us-west-2, I'm not sure what is wrong. I'll see what happens when I try to see the image from my personal aws account not associated with our lab.

EntilZha commented 6 years ago

I think I found the problem. It is indeed a problem with the region configuration.

In the config file ~/.aws/config it needs to have region = us-west-2. In aws.tf did you change these lines?

provider "aws" {
  region = "us-west-2"
}

and/or

resource "aws_subnet" "qanta_zone_2c" {
  vpc_id                  = "${aws_vpc.qanta.id}"
  cidr_block              = "10.0.2.0/24"
  map_public_ip_on_launch = true
  availability_zone = "us-west-2c"
}
widavies commented 6 years ago

All those lines matched: https://imgur.com/a/y8wCE

Any other ideas?

EntilZha commented 6 years ago

Can you run this command and paste its output, the difference is manually overriding any region configuration

aws --region us-west-2 ec2 describe-images --filters Name=tag-key,Values=Image Name=tag-value,Values=qanta-cpu

widavies commented 6 years ago

Output:

{
    "Images": []
}
EntilZha commented 6 years ago

I have a feeling that EC2 tags aren't public even if the AMI is.

On the web console, can you make sure that you are on us-west-2 (Oregon), click on launch instance, then under community AMI search for qanta-cpu? If you see the AMI there then it is indeed a tag issue which I'll look into fixing

widavies commented 6 years ago

Yep, just found both of them

widavies commented 6 years ago

Is there a way I can hard code the AMI-ID into the aws.tf file?

EntilZha commented 6 years ago

You can, but I don't know off top of my head. Give me a few minutes to fiddle around because I think I see a way I can fix that

EntilZha commented 6 years ago

I've updated the code, try pulling and then rerunning terraform plan

widavies commented 6 years ago

It worked!

I did have to remove these lines again:

data "aws_iam_instance_profile" "s3_instance_profile" {
  name = "s3-full-access"
}
iam_instance_profile = "${data.aws_iam_instance_profile.s3_instance_profile.name}"

Not sure if you wanted to include that in instructions.

I'm going to work on the SSH now and I'll let you know how that goes. You can close this issue when you want.

EntilZha commented 6 years ago

I'll also look into fixing that. Most likely I'll include that in an override script and remove it from the main file.

widavies commented 6 years ago

I'm having an issue with terraform apply:

https://imgur.com/a/IMXCQ

widavies commented 6 years ago

A timeout occurs whenever I run terraform apply

EntilZha commented 6 years ago

I'll be busy the rest of today, but best thing I can think of is that you haven't added your ssh key to the ssh agent (this is the one referred to by public-key that you originally had the hard coded path to). These docs on adding your key to the agent (and starting the agent) might be helpful and in general are good to put in your ~/.bashrc https://help.github.com/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent/#adding-your-ssh-key-to-the-ssh-agent

widavies commented 6 years ago

Okay, I did the SSH thing and now I get this error: https://imgur.com/a/dyc4Y Any ideas?

Thanks for the help!

widavies commented 6 years ago

I was able to SSH into the server, but elasticsearch command was not found and the luigi command wouldn't work.

widavies commented 6 years ago

I think what would be helpful is if you made a wiki page detailing EXACTLY what to do from the start, commands to enter, what order, what should be done on the AWS website, etc. The readme is really confusing and even after several days I haven't been able to get it running. You mentioned you did a clean install of Ubuntu, so maybe you could just video or write down the exact steps you took to get QANTA running. There's also nothing on the readme telling you what to do with .pem file from AWS.

EntilZha commented 6 years ago

I believe that error is that it attempted to download files from S3, but couldn't find aws keys on the instance. This is actually the reason for having the s3-full-access IAM role (we use to manually copy keys over, but stopped doing that). The easiest way to fix this will be to go into the IAM console and create a role named s3-full-access and give it all the S3 permissions available. The real fix is for me to change our code to instead of using s3:// for public file access to use https://, but at the moment I am busy with paper deadlines so won't get around to that.

You were able to SSH since terraform won't shut down the instance once its made, even if there is an error. From there you can install the remaining dependencies by looking at aws.tf and seeing where it crashed, and manually execute the reamining commands. Elasticsearch needs to be installed which there is a link in the readme, but make sure to install 5.X (this should be in the readme, since it didn't work on 6.X for me). I can't help fix luigi command wouldn't work without more information.

As much as I would love to spend a good amount of time improving the readme, right now I just don't have the time as there are multiple research paper deadlines coming up for me. The readme at the moment is primarily a result of either myself or others adding stuff as we find issues in an attempt to make it easier for others (including and especially our future selves). So far the docs are probably lacking because only people in our group have used AWS, others have mainly taken the install instructions to setup on their own machine.

I strongly encourage you to open a PR for the readme and document some of the things you had to do to get things working. I would happily review changes to make sure everything is accurate. You are actually in the best position to contribute useful documentation for this since you've been dealing with it.

If after doing the IAM profile you are still unable to get it running then of course feel free to post again, but make sure to include the full error log from the console.

widavies commented 6 years ago

Yeah, I'll provide more information on the luigi command for you. I was booted into Windows on my computer so I didn't have access to the error message. I'll write up a wiki page or edit the readme for you when I get everything working. I've been recording the steps along the way, so I can work on making a nice tutorial guide for you.

Thanks for all the help, good luck with those papers!

widavies commented 6 years ago

Alright I think I got it working. How do I ask it a question via command line?

widavies commented 6 years ago

I am getting this error when running the luigi command: https://imgur.com/a/gfciI

EntilZha commented 6 years ago

I don't recall for certain since its been a while since I've run that code, I think that the expo pipeline is meant to be run after the training pipeline. Hence, Prerequisite are files that are output by All https://github.com/Pinafore/qb/blob/master/qanta/pipeline/__init__.py#L63. That code hasn't been run for about a year at this point, and I am pretty sure there are some changes that have broken that pipeline. We plan on fixing it soon as we need to for our publication, but there is a decent chance it breaks right now.

My suggestion would be to aim for being able to interactively query a guesser. The shortest path to doing that would be:

  1. cp qanta-defaults.yaml qanta.yaml
  2. Look through qanta.yaml, you will see a section labeled guesser. This is the list of all the guessers we have implemented. If you have elastic search installed and running you can leave it as is, otherwise set enabled: false, and enable another guesser like the TFIDF one (its only dependency is sklearn) by setting enabled: true.
  3. Run luigi --module qanta.pipeline.guesser AllSingleGuesserReports
  4. Then you should be able to load the guesser up via:
    
    from qanta.guesser.tfidf import TfidfGuesser

guesser = TfidfGuesser.load('output/guesser/qanta.guesser.tfidf.TfidfGuesser') guesser.guess(['name the first president of the united states'])

widavies commented 6 years ago

I tried loading up the TfidfGuesser, but it says: FileNotFoundError: [Errno 2] NO such file or directory: 'output/guesser/qanta/guesser.tfidf.TfidfGuesser/params.pickle

widavies commented 6 years ago

Any updates on this?

EntilZha commented 6 years ago

It looks like you have that mispelled, should be a . instead of a / between qanta and guesser

widavies commented 6 years ago

It looks like TfidfGusser isn't even located in the output/guesser directory

widavies commented 6 years ago

How do I fix this?

EntilZha commented 6 years ago

Did you follow the steps I outlined and run luigi --module qanta.pipeline.guesser AllSingleGuesserReports without error?

widavies commented 6 years ago

Yeah, I get this error: https://imgur.com/a/qRtau

EntilZha commented 6 years ago

The image is to blurry. Can you paste the output in text form in a github code blocks instead?

widavies commented 6 years ago

If you click on the image it should zoom in. I'm able to see it just fine on my home pc.

widavies commented 6 years ago

Also I'm not able to copy the output for some reason through ssh

EntilZha commented 6 years ago

Try adding --local-scheduler