Open 000generic opened 7 years ago
you should run phing from inside the TBro main container - if you installed everything according to the documentation, you can enter that container using docker exec -it TBro_official /bin/bash
and that container should already contain an installation of phing.
if you are still missing it, you should be able to install it using composer global require phing/phing
inside that container, running phing database-initialize
should be possible from /home/tbro
in any case, the folder to run phing from is the folder where the build.xml
is located in.
I tried to follow your installation directions closely when I installed TBRo - and I have reinstalled repeatedly without success when I get to the phing command.
Phing appears to be installed but a 'command not found' error is given when I try to run phing in the TBro directory that has the build.xml file - or in any directory in TBro.
Following your directions above:
ubuntu@ip-172-26-13-108:~$ docker exec -it TBro_official /bin/bash
oot@9b953c8ae04e: /root@9b953c8ae04e:/# phing queue-install-db bash: phing: command not found
oot@9b953c8ae04e: /root@9b953c8ae04e:/# composer global require phing/phing Changed current directory to /root/.composer Running composer as root/super user is highly discouraged as packages, plugins and scripts cannot always be trusted Using version ^2.16 for phing/phing ./composer.json has been updated Loading composer repositories with package information Updating dependencies (including require-dev) Nothing to install or update Generating autoload files
oot@9b953c8ae04e: /root@9b953c8ae04e:/# cd home/tbro/ oot@9b953c8ae04e: /home/tbroroot@9b953c8ae04e:/home/tbro# phing database-initialize bash: phing: command not found
oot@9b953c8ae04e: /home/tbroroot@9b953c8ae04e:/home/tbro# ls INSTALLATION build.xml doc src README.md build_installation.sh enable_AllowOverride_Apache2.sed test build.properties composer.json phpunit.xml update_config.sed build.properties.example composer.lock queue_config.example.sql update_installation.sh
oot@9b953c8ae04e: /home/tbroroot@9b953c8ae04e:/home/tbro# phing queue-install-db bash: phing: command not found
...I just realized the file phing will generate is already in the directory - queue_config.example.sql I'm not sure how it was generated, as I still haven't gotten phing to work but I'll try working with it.
...it looks like queue_config.example.sql was generated when things were built by docker exec -i -t TBro_official /home/tbro/build_installation.sh
screen output:
Buildfile: /home/tbro/build.xml [property] Loading /home/tbro/./build.properties
tbro > queue-install-db:
[copy] Copying 1 file to /home/tbro
[echo] an example configuration has been copied to /home/tbro/queue_config.example.sql!
[echo] modify it to your needs and load it into your blast database
BUILD FINISHED
Total time: 0.4795 seconds
Two new questions:
1) To move zipped blast database files into my docker container using
curl --data-binary --ftp-pasv --user "$WORKERFTP_FTP_USER":"$WORKERFTP_FTP_PW" -T cannabis_sativa_transcriptome.zip ftp://$WORKERFTP_IP/
how can I determine what the values of the three variables
$WORKERFTP_FTP_USER $WORKERFTP_FTP_PW $WORKERFTP_IP
are for my Docker container?
When I run set in TBro, nothing shows up for the three variables:
oot@dddabdc84640: /root@dddabdc84640:/# set | grep WORKERFTP WORKERFTP_ENV_FTP_PW=ftp WORKERFTP_ENV_FTP_USER=tbro WORKERFTP_NAME=/TBro_official/WORKERFTP WORKERFTP_PORT=tcp://172.17.0.4:21 WORKERFTP_PORT_21_TCP=tcp://172.17.0.4:21 WORKERFTP_PORT_21_TCP_ADDR=172.17.0.4 WORKERFTP_PORT_21_TCP_PORT=21 WORKERFTP_PORT_21_TCP_PROTO=tcp
so I'm not sure where to find values for the them.
I tried
$WORKERFTP_ENV_FTP_USER $WORKERFTP_ENV_FTP_PW $WORKERFTP_PORT_21_TCP_ADDR
in the curl command but it didn't seem to work:
curl --data-binary --ftp-pasv --user “tbro”:”ftp” -T blastdb-Harvard-AA.zip ftp://172.17.0.4/
curl: (67) Access denied: 530 when run from Ubuntu curl: (6) Could not resolve host: tbro when run from TBro
2) How do I "run the queue_config.sql commands in your queue database." ?
Thank-you!
I think I'll ping @greatfireball or @iimog on this, this is getting too specialized with the setup for me now, as they created the docker containers.
Hi @000generic,
sorry for the confusion. I think the documentation needs some serious improvements. First of all "the main TBro directory" is indeed /home/tbro/
so the directory containing the source code (I will clarify that in the docs). phing
is installed via composer so it is available in ~/.composer/vendor/bin
this is added to the path via the ~/.bash_profile
which is apparently not loaded when entering the container. You can fix that by either entering:
source ~/.bash_profile
or
export PATH=~/.composer/vendor/bin:$PATH
But anyway you are right phing queue-install-db
is already executed when following the installation instructions (by build_installation.sh
)
You are also right regarding the environment variables (I will update them in the docs). However, the curl command should work from TBro. Can you please try again this one:
curl --data-binary --ftp-pasv --user $WORKERFTP_ENV_FTP_USER:$WORKERFTP_ENV_FTP_PW -T blastdb-Harvard-AA.zip ftp://"$WORKERFTP_PORT_21_TCP_ADDR"/
To import the content of the queue_config.sql
file into the queue database execute (from TBro):
PGPASSWORD=$WORKER_ENV_DB_PW psql -U $WORKER_ENV_DB_USER -h $WORKER_PORT_5432_TCP_ADDR -p $WORKER_PORT_5432_TC
P_PORT <queue_config.sql
Getting closer....
I was able to run both the curl and PGPASSWORD commands successfully now - but nothing is showing up in TBro as a blast database to blast against. Specifically, I did the following:
cd /sono/peptides # this is where I placed by zipped blast databases curl --data-binary --ftp-pasv --user tbro:ftp -T blastdb-barnacle-AA.zip ftp://172.17.0.4/ curl --data-binary --ftp-pasv --user tbro:ftp -T blastdb-barnacle-TR.zip ftp://172.17.0.4/
cd /home/tbro mv queue_config.example.sql queue_config.sql nano queue_config.sql
-- database files available. name is the name it will be referenced by, md5 is the zip file's sum, download_uri specifies where the file can be retreived INSERT INTO database_files (name, md5, download_uri) VALUES ('blastdb-barnacle-AA', '50e7cb5a77f37641a648edc59abcc11a', 'ftp://172.17.0.4/blastdb-barnacle-AA.zip'), ('blastdb-barnacle-TR', '7fc500cce7bb9ac925c39e5d1f986640', 'ftp://172.17.0.4/blastdb-barnacle-TR.zip’);
...etc
-- contains information which program is available for which program. -- additionally, 'availability_filter' can be used to e.g. restrict use for a organism-release combination INSERT INTO program_database_relationships (programname, database_name, availability_filter) VALUES ('blastn','blastdb-barnacle-TR', 'barnacle-T1'), ('blastp','blastdb-barnacle-AA', 'barnacle-T1'), ('blastx','blastdb-barnacle-AA', 'barnacle-T1'), ('tblastn','blastdb-barnacle-TR', 'barnacle-T1'), ('tblastx','blastdb-barnacle-TR', 'barnacle-T1’);
...etc
PGPASSWORD=worker psql -U worker -h 172.17.0.3 -p 5432 <queue_config.sql
I then tried to blast in TBro but no databases were offered as an option.
So sorry, another lack of documentation.
Whether a database shows up in TBro only depends on the queue_config.sql
and specifically the section program_database_relationships
. Here the availability_filter
is key (and totally undocumented).
This column decides for which organism and release which blast database is shown. The format of this column is {organism_id}_{release}
so in case of the demo data the organism_id
is "13" and the release is "1.CasaPuKu" so for the blast db to show up the availability_filter
had to be set to 13_1.CasaPuKu
. If "barnacle-T1" is your release and 14 is your organism_id
(you can check with tbro-db organism list
) you have to change the availability filter in queue_config.sql
to 14_barnacle-T1
.
In order to import this file into the database again you have to remove all sections except the program_database_relationships
(otherwise you get errors due to duplicate key value violating unique constraints).
I will add the documentation for the availability_filter
column both to the example sql file and the documentation on readthedocs.
Thank you very much for your endurance and for reporting all the problems. This helps a lot in improving the documentation.
Great! Now the blast databases are showing up in TBro - Thank-you :)
....but I think I have to correct the uri I am giving TBro, which I had guessed at after curling my zipped Blast database files into Docker.
curl --data-binary --ftp-pasv --user tbro:ftp -T blastdb-barnacle-AA.zip ftp://172.17.0.4/ curl --data-binary --ftp-pasv --user tbro:ftp -T blastdb-barnacle-TR.zip ftp://172.17.0.4/
When I then configure the queue.config.sql file with:
('barnacle-AA4', '50e7cb5a77f37641a648edc59abcc11a', 'ftp://172.17.0.4/blastdb-barnacle-AA.zip'), ('barnacle-TR4', '7fc500cce7bb9ac925c39e5d1f986640', 'ftp://172.17.0.4/blastdb-barnacle-TR.zip');
TBro throws an error:
There has been an error processing your job. Please review your job. If this keeps happening, notify the administrator.
These errors occured: BLAST Database error: No alias or index file found for protein database [/tmp/queue-worker//barnacle-AA4.50e7cb5a77f37641a648edc59abcc11a/barnacle-AA4] in search path [/tmp/queue-worker::]
and when I configure the queue.config.sql file with:
('barnacle-AA5', '50e7cb5a77f37641a648edc59abcc11a', 'http://172.17.0.4/blastdb-barnacle-AA.zip'), ('barnacle-TR5', '7fc500cce7bb9ac925c39e5d1f986640', 'http://172.17.0.4/blastdb-barnacle-TR.zip');
TBro seems to hang up:
Blast Results
Your job is currently being processed. Please wait a moment. This page will refresh in 2 seconds.
The page does an initial refresh saying it is one of one in queue - and then doesn't seem to refresh any more - and remains stalled after many minutes.
OK, now we are really closing in on this. The blastdb is visible in TBro and the download of the zip file seems to work as well. The ftp configuration is the correct one. The problem now is that after unpacking the zip file the blastdb files are not found. How are those named?
TBro expects the blastdb files in the zip to be named the same as the name in the database_files
table so in your case (this is barnacle-AA3
and barnacle-TR3
, right?) TBro will look for files barnacle-AA3.phr
, barnacle-AA3.pin
, barnacle-AA3.psq
in your zip folder. If they are named differently they will not be found.
My suggestion: first clean up the old values from database_files
and program_database_relationships
table by executing this command:
PGPASSWORD=$WORKER_ENV_DB_PW psql -U $WORKER_ENV_DB_USER -h $WORKER_PORT_5432_TCP_ADDR -p $WORKER_PORT_5432_TCP_PORT -d $WORKER_ENV_DB_NAME -c 'TRUNCATE database_files CASCADE'
use with care as it will also remove all past and present blast jobs from the database.
Then re-import an sql file with the two sections for database_files
and program_database_relationships
with fixed name
column where it corresponds to the name of the blastdb (without .p*
or .n*
ending). If you verify that it works I will update docu here as well.
Genius! That works great - now I am blasting against my blast databases :)
....however, while the blast hits show visual alignments with many good hits, they are are not showing any isoform information (instead under the Name column in the blast report the hits all say 'No') and the link in 'No' just goes to the TBro landing page.
This is true even when blasting, for instance, a protein that was used in building the blast database. When I search the same protein in TBro based on its id, the protein is returned as an isoform with a link that takes me to its TBro webpage. I checked and identifiers used in 1) the imported fasta files, 2) imported identifiers, 3) imported .tbl files, and4) in fastas used to build the blast databases are all the same. For instance:
barnacle-ee100-aa
So, it seems like the blast job is successful but the hits generated are not linking back to the TBro databases. I'll try rebuilding TBro again from the ground up but most likely I need to modify something somewhere along the way.
Once we have all this worked out, I can generalize the steps and provide them to you - or post in GitHub etc. I think the combination of free/cheap easy up/easy down Amazon cloud + TBro is really great. Rather than a long-term repository, often times its useful to make things available to collaborators (or myself) with many updates for just a few days to months, and I think the Amazon/Docker/TBro combo is going to be a great way to do this. There is already growing interest from others here at the Marine Biological Laboratory.
Nice! Happy to hear that it is finally working. The issue with showing "No" in the Name column is indeed very strange. TBro tries to map the name of the blast hit to an internal ID but even if it fails it should still show the original ID of the hit. This ID is parsed out of the Blast result xml. Something seems to go wrong there. Would you mind sharing the xml result? You can get this by calling the webservice directly via:
http://<your-tbro-machine>/ajax/queue/job_results?jobid=<your-jobid>
replacing both your-tbro-machine
and your-jobid
with the respective values. The jobid is the one you get when starting a blast job.
If you do not want to share this file you can have a look yourself. In the <Hit_def>
tag the first word is assumed to be the ID. For an example blast job on the public instance a Hit_def
line might look like this:
<Hit_def>cds.comp234028_c1.1_seq4|m.808277 comp234028_c1.1_seq4|g.808277 ORF comp234028_c1.1_seq4|g.808277 comp234028_c1.1_seq4|m.808277 type:complete len:725 (+) comp234028_c1.1_seq4:254-2428(+)</Hit_def>
How does a Hit_def
line look in your blast result xml?
I hope to sort out this last problem as well. A step by step guide for TBro on AWS would be really cool. If you don't mind I'd suggest including it as a separate section in the official documentation. Your contribution in improving and disseminating TBro is very much appreciated.
Sure - here is the xml file. It looks like 'No' is short for 'No definition line' - I'll try rebuilding things and see if I can get lucky and solve anything.
{
"job_status": "PROCESSED",
"additional_data": {
"organism": "16",
"release": "barnacle-T1"
},
"processed_results": [
{
"query": ">barnacle-ee100\nTTAGGAGCAAATGAAAAGAAGAAAGCTGGAAAAAGAGGCAGATCTGCAGCGAATAATTTTCTTTTAAACACAAAATCCCGATAAAACCACACGATGGACAGGTTTGGGCCGTTTACAAAGCAGAACATCTCCCGAGGAAAACACTCCGCAACCGAGCGAAGGTCTTGGCATGGGGAATGACGAGCGCTTGGAATTTGCAAAATTTGCACAATGTGTCTGAGAAACAGACGTCCGACACATTCTGCTACCATATGATGCGAAAAATTTACTCCTGGCACTTCCCATTTGTTCAGGAATGGGGTTGTTTTGAAAAGGAAAATGGTGCTTGGGAGGTCGGCGCCAGTCATTCATGAAGGAGTTAGCGCCAGAGAAGACCTACAAAATACTCACGGATGGTGTCGATGCAAGCTGTCGGCTTTCAGGGAGAAGCGGATGTTGCCGGGGAGCTCGTCAAACCTAAATCCGAC",
"status": "PROCESSED",
"result": "<?xml version=\"1.0\"?>\n<!DOCTYPE BlastOutput PUBLIC \"-\/\/NCBI\/\/NCBI BlastOutput\/EN\" \"http:\/\/www.ncbi.nlm.nih.gov\/dtd\/NCBI_BlastOutput.dtd\">\n
Hi! Thanks for your help on the last two issues :)
I'm now having trouble running phing to set up the blast databases.
From the documentation, I'm not sure if I should be running phing at the TBro command line - or at the Ubuntu command line. At the TBro command line, phing can not be run or installed in my hands. At the Ubuntu command line, phing can be installed but does not run successfully.
Also, I don't know how to locate TBro directories when I am not at the TBro command line. Is it possible to enter TBro directories when I am at the Ubuntu command line?
I'm also unsure what you mean by "main TBro directory" in the documentation. Is this the default directory when I start up TBro? Or the directory I created to store my data in?
Details follow:
In TBro:
First I should move to my "main TBro directory" - I am guessing this is the directory I created to store all my data in when I set up TBro...?
cd /squid
I then follow TBro documentation instructions but the phing command is not found when run at the TBro command line:
oot@9b953c8ae04e: /root@9b953c8ae04e:/# phing queue-install-db bash: phing: command not found
When I try to install phing I get the following error:
sudo apt-get update sudo apt-get install phing
Reading package lists... Done Building dependency tree
Reading state information... Done E: Unable to locate package phing
so I'm not sure how to install phing at the TBro command line.
Outside TBro
I am able to install phing at the Ubuntu command line but I don't know how to locate my main TBro directory inside of Docker - is it possible to enter TBro directories from outside Docker/TBro? When I run the required phing command in different folders I get the following error:
ubuntu@ip-172-26-13-108:/$ phing queue-install-db Buildfile: build.xml does not exist!
I'm not sure what this error is in reference to but maybe its related to being in the wrong directory?
So I'm not sure 1) what directory I should be running phing in, 2) if I should run phing the Ubuntu or in TBro command line, and 3) if I should run phing in TBro, I'm not sure how to install it.
Any suggestions would be greatly appreciated.
Thank-you