Closed widavies closed 6 years ago
We would love to improve our readme/setup instructions! If you could write more about the issues you are running into then it would be easier to both help you get the system running as well as update/improve the setup guide for the future.
Could you post an explanation of what commands you have run and the error messages/program logs?
Yeah! Thanks. Essentially I'm just trying to set up QANTA using Ubuntu Linux as the local setup and with AWS of course as cloud. The dependencies you have listed (Terraform and Packer) didn't install from the links, so I had to find command line alternatives online. Then, I cloned this repository locally and ran
"sudo terraform apply"
I get these errors: 1) s3_instance_type was not found 2) ami was not found
I didn't change anything from the repository you specified.
So if you could make a tutorial guide with "line by line" instructions for how to setup QANTA, that would make it so much easier. Thanks! Also, some of your links in the readme don't go to a section, like "Run AWS scripts" for example.
Thanks for more info, it helps a lot.
Terraform/packer are binaries so you can download them and put them in your PATH
. Depending on your distro, they may also be available via your package manager (eg I know Arch Linux has them via pacman
).
You shouldn't need to run terraform apply
with sudo
. This might be a source of problems since sudo
wipes out your user environment variables.
Since I just clean installed ubuntu on my laptop I can see if I run into the same errors you have, but let me know if running without sudo
makes a difference.
For the tutorial guide, we/I had hoped that the readme could serve as both information and a tutorial guide.
Yeah! The readme isn't bad, but just assume less and assume the user is stupider (like me). Make the commands and process a bit more logistical, add step numbers, commands to run, etc.
Just a little tweaking could make it so much more readable. Thanks!
After downloading the terraform/packer binaries and adding it to my path, I also set these environment variables and was able to get reasonable output from terraform plan
.
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
TF_VAR_key_pair
TF_VAR_access_key
TF_VAR_secret_key
Do you get something similar?
I was busy most of today. I'll try it out tomorrow.
Alright, here's the order I did things in:
sudo git clone https://github.com/Pinafore/qb
sudo export AWS_ACCESS_KEY_ID=exampleaccessid
sudo export AWS_SECRET_ACCESS_KEY=exampleaccesskey
sudo export TF_VAR_KEY_pair=/home/will/Downloads/qanta.pem
sudo export TF_VAR_ACCESS_KEY=[same as AWS_ACCESS_KEY_ID]
sudo export TF_VAR_secret_key=[same as AWS_SECRET_ACCESS_KEY]
cd qb
sudo terraform init
sudo terraform apply
I get the following output from my console: https://imgur.com/a/OplZ7
Any thoughts?
So first thing, you should not be using sudo
on all your commands. The reason you got the first error message is because you ran clone/init/apply with sudo
which meants that root
owns those files, so your regular user can't modify them. You should be able to use chown
with the appropriate flag to make it recursive to have your user own the files in qb
instead of root. sudo
should only be necessary for things like apt install
.
Second, you should put the environment variable exports in ~/.bashrc
or a similar file so that they get automatically set on new terminals instead of having to type it in every time.
Onto the error on iam profile, that would be my fault. IAM profiles in EC2 are basically ways to say that anyone with a profile named X (in this case s3-full-access
) will have all permissions assigned to that role. In this case, that refers to giving anything we launch full access to our private and public S3 storage. The easiest thing to do to fix that is simply remove these lines:
data "aws_iam_instance_profile" "s3_instance_profile" {
name = "s3-full-access"
}
iam_instance_profile = "${data.aws_iam_instance_profile.s3_instance_profile.name}"
I'll look into seeing if there is an easy to for us to still use automatic profiles without breaking the terraform build for others.
For the AMI issue, I'm not certain why that is occurring since I verified that our images are public. Could you run the following command and paste your output. This assumes you have the AWS CLI installed (python package awscli
)
aws ec2 describe-images --filters Name=tag-key,Values=Image Name=tag-value,Values=qanta-cpu
Last thing, I would strongly recommend running terraform plan
, it will output what terraform apply
will do, without executing anything. Once you are sure it looks correct, then do terraform apply
Last thing (again), for the key pair, that should be the name of the key pair from the AWS console, not the path to the key pair. Its probably the file name without .pem
Thanks for the help.
I did all the things you requested and most of the errors disappeared. The AMI search still isn't found yet though. Here's the output: https://imgur.com/a/6FWpL
I'll get the environment variables in the ./bashrc so I don't have to keep entering them.
Thanks!
Hmm, everything on my end says that it is public and accessible. Could you verify that you get output similar to this (checking aws region):
$ ls ~/.aws
config credentials
entilzha on lothal at 02:08 in /home/entilzha/
$ cat ~/.aws/config
[default]
region = us-west-2
Is there anything else out the normal that might be causing permissions issues that you can think of that would be different from using a clean aws account?
I don't get
entilzha on lothal at 02:08 in /home/entilzha/
or anything similar in the output.
I'm not sure. My AWS account isn't actually fresh, I've had it for a while and use it to host some other projects. Maybe it's something to do with the AWS cli? Does that need to be configured?
I tried to do a search for your AMI on EC2 and couldn't find it https://imgur.com/a/n5mgS
The line you mentioned is since I have a custom bash prompt that displays my username, hostname, and current directory. Could you verify that ~/.aws/config
contains region = us-west-2
?
The CLI should be representative of what terraform sees, and the fact that it doesn't show up in the web based aws console confirms that for some reason you can't see that image. Assuming your region is set to us-west-2
, I'm not sure what is wrong. I'll see what happens when I try to see the image from my personal aws account not associated with our lab.
I think I found the problem. It is indeed a problem with the region configuration.
In the config file ~/.aws/config
it needs to have region = us-west-2
. In aws.tf
did you change these lines?
provider "aws" {
region = "us-west-2"
}
and/or
resource "aws_subnet" "qanta_zone_2c" {
vpc_id = "${aws_vpc.qanta.id}"
cidr_block = "10.0.2.0/24"
map_public_ip_on_launch = true
availability_zone = "us-west-2c"
}
All those lines matched: https://imgur.com/a/y8wCE
Any other ideas?
Can you run this command and paste its output, the difference is manually overriding any region configuration
aws --region us-west-2 ec2 describe-images --filters Name=tag-key,Values=Image Name=tag-value,Values=qanta-cpu
Output:
{
"Images": []
}
I have a feeling that EC2 tags aren't public even if the AMI is.
On the web console, can you make sure that you are on us-west-2 (Oregon), click on launch instance, then under community AMI search for qanta-cpu
? If you see the AMI there then it is indeed a tag issue which I'll look into fixing
Yep, just found both of them
Is there a way I can hard code the AMI-ID into the aws.tf file?
You can, but I don't know off top of my head. Give me a few minutes to fiddle around because I think I see a way I can fix that
I've updated the code, try pulling and then rerunning terraform plan
It worked!
I did have to remove these lines again:
data "aws_iam_instance_profile" "s3_instance_profile" {
name = "s3-full-access"
}
iam_instance_profile = "${data.aws_iam_instance_profile.s3_instance_profile.name}"
Not sure if you wanted to include that in instructions.
I'm going to work on the SSH now and I'll let you know how that goes. You can close this issue when you want.
I'll also look into fixing that. Most likely I'll include that in an override script and remove it from the main file.
I'm having an issue with terraform apply
:
A timeout occurs whenever I run terraform apply
I'll be busy the rest of today, but best thing I can think of is that you haven't added your ssh key to the ssh agent (this is the one referred to by public-key that you originally had the hard coded path to). These docs on adding your key to the agent (and starting the agent) might be helpful and in general are good to put in your ~/.bashrc
https://help.github.com/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent/#adding-your-ssh-key-to-the-ssh-agent
Okay, I did the SSH thing and now I get this error: https://imgur.com/a/dyc4Y Any ideas?
Thanks for the help!
I was able to SSH into the server, but elasticsearch command was not found and the luigi command wouldn't work.
I think what would be helpful is if you made a wiki page detailing EXACTLY what to do from the start, commands to enter, what order, what should be done on the AWS website, etc. The readme is really confusing and even after several days I haven't been able to get it running. You mentioned you did a clean install of Ubuntu, so maybe you could just video or write down the exact steps you took to get QANTA running. There's also nothing on the readme telling you what to do with .pem file from AWS.
I believe that error is that it attempted to download files from S3, but couldn't find aws keys on the instance. This is actually the reason for having the s3-full-access
IAM role (we use to manually copy keys over, but stopped doing that). The easiest way to fix this will be to go into the IAM console and create a role named s3-full-access
and give it all the S3 permissions available. The real fix is for me to change our code to instead of using s3://
for public file access to use https://
, but at the moment I am busy with paper deadlines so won't get around to that.
You were able to SSH since terraform won't shut down the instance once its made, even if there is an error. From there you can install the remaining dependencies by looking at aws.tf
and seeing where it crashed, and manually execute the reamining commands. Elasticsearch needs to be installed which there is a link in the readme, but make sure to install 5.X (this should be in the readme, since it didn't work on 6.X for me). I can't help fix luigi command wouldn't work
without more information.
As much as I would love to spend a good amount of time improving the readme, right now I just don't have the time as there are multiple research paper deadlines coming up for me. The readme at the moment is primarily a result of either myself or others adding stuff as we find issues in an attempt to make it easier for others (including and especially our future selves). So far the docs are probably lacking because only people in our group have used AWS, others have mainly taken the install instructions to setup on their own machine.
I strongly encourage you to open a PR for the readme and document some of the things you had to do to get things working. I would happily review changes to make sure everything is accurate. You are actually in the best position to contribute useful documentation for this since you've been dealing with it.
If after doing the IAM profile you are still unable to get it running then of course feel free to post again, but make sure to include the full error log from the console.
Yeah, I'll provide more information on the luigi command for you. I was booted into Windows on my computer so I didn't have access to the error message. I'll write up a wiki page or edit the readme for you when I get everything working. I've been recording the steps along the way, so I can work on making a nice tutorial guide for you.
Thanks for all the help, good luck with those papers!
Alright I think I got it working. How do I ask it a question via command line?
I am getting this error when running the luigi command: https://imgur.com/a/gfciI
I don't recall for certain since its been a while since I've run that code, I think that the expo pipeline is meant to be run after the training pipeline. Hence, Prerequisite
are files that are output by All
https://github.com/Pinafore/qb/blob/master/qanta/pipeline/__init__.py#L63. That code hasn't been run for about a year at this point, and I am pretty sure there are some changes that have broken that pipeline. We plan on fixing it soon as we need to for our publication, but there is a decent chance it breaks right now.
My suggestion would be to aim for being able to interactively query a guesser. The shortest path to doing that would be:
cp qanta-defaults.yaml qanta.yaml
qanta.yaml
, you will see a section labeled guesser
. This is the list of all the guessers we have implemented. If you have elastic search installed and running you can leave it as is, otherwise set enabled: false
, and enable another guesser like the TFIDF one (its only dependency is sklearn) by setting enabled: true
.luigi --module qanta.pipeline.guesser AllSingleGuesserReports
from qanta.guesser.tfidf import TfidfGuesser
guesser = TfidfGuesser.load('output/guesser/qanta.guesser.tfidf.TfidfGuesser') guesser.guess(['name the first president of the united states'])
I tried loading up the TfidfGuesser, but it says: FileNotFoundError: [Errno 2] NO such file or directory: 'output/guesser/qanta/guesser.tfidf.TfidfGuesser/params.pickle
Any updates on this?
It looks like you have that mispelled, should be a .
instead of a /
between qanta
and guesser
It looks like TfidfGusser isn't even located in the output/guesser directory
How do I fix this?
Did you follow the steps I outlined and run luigi --module qanta.pipeline.guesser AllSingleGuesserReports
without error?
Yeah, I get this error: https://imgur.com/a/qRtau
The image is to blurry. Can you paste the output in text form in a github code blocks instead?
If you click on the image it should zoom in. I'm able to see it just fine on my home pc.
Also I'm not able to copy the output for some reason through ssh
Try adding --local-scheduler
I've spent a day trying to figure this out and haven't even been able to succeed getting terraform to execute. Please make a better setup guide.