flass / pantagruel

a pipeline for reconciliation of phylogenetic histories within a bacterial pangenome
GNU General Public License v3.0
46 stars 7 forks source link

Task 4: InterproScan was not found on this machine #30

Closed megaptera-helvetiae closed 4 years ago

megaptera-helvetiae commented 4 years ago

Hi Florent,

The optional task 4 did not work on our machine, because InterproScan is not installed.

Could you get me and other users started on how we could do the InterProScan on our own?

Or should I just follow its own Wiki? https://github.com/ebi-pf-team/interproscan/wiki

I was just wondering whether there is anything in particular we should know about the relationship of Pantagruel and the InterProScan, and whether you wanted to provide some more info about that on your GitHub page.

Thank you!

flass commented 4 years ago

Interproscan is easy to install in the sense that it is just a file you download and then execute with Java (having the right java [JDK/JRE version 11] installed can be an issue, though). The only real problem with Interproscan is the size of that file, ~10GB when compressed, which can be tricky to download, but nothing crazy let alone impossible. to install it you should indeed refer to Insterproscan own wiki. About using Interproscan with the aim to integrate the results to Pantagruel database, I would strongly recommend to execute Pantagruel task 4, as it expects intermediary files created during that task and creates a specific file output that is later loaded into the SQLite database. if for any reason (no Java JRE 11, Interproscan executable file is too big, ...) you can't run Pantagruel task 4 on your usual server, remember that a Pantagruel is self-contained in the main folder and thus can be copied and transferred somewhere else, and continued from there - with the only requirement to modify the ptgroot definition in the config file, as well as the corresponding value behind the -r option in the recorded pantagruel ... init command at the top of the file (for later refresh commands). This way, you can execute different tasks on different servers depending on the convenience of their respective CPU/meme/storage capacities. I recommend doing that for tasks 6 and 7 for instance, which are best executed on a cluster. I will try and document that flexibility on the main page (I should probably make it a wiki really). I hope this helps.

megaptera-helvetiae commented 4 years ago

Hi Florent,

I successfully installed interproscan and I can run it with a test file.

Where in task 4 do I tell pantagruel where it can find interproscan?

At the moment it is just a directory with the database and the shell script inside my pantagruel working directory.

Pantagruel does not see it:

# will run tasks: 4
[2020-01-08 04:12:00] Pantagruel pipeline task 4: use InterProScan to functionally annotate proteins in the database.
Task folder '/scratch/clamchatka/Panta/test9/04.functional' already exists; FORCE mode is on: ERASE and recreate the folder to write new result in its place
InterproScan was not found on this machine: cannot run this (facultative) task of Pantagruel pipeline; exit now
ERROR: Pantagruel pipeline task 4: failed.
flass commented 4 years ago

Hi Laetitia,

the command interproscan has to be available in your $PATH. This should point to the interproscan.sh script provided in the InterProScan archive. One way is to do as in the Pantagruel install_dependencies.sh script:

# here you've got to define where your InterProscan is installed
SOFTWARE=/home/you/yoursoftware
currIPversion=5.39-77.0 # to update as the EBI releases versions
BINS=/home/you/bin # any place really
ln -s ${SOFTWARE}/interproscan-${currIPversion}/interproscan.sh ${BINS}/interproscan
# make this location avalable in your PATH
export PATH="${BINS}:${PATH}"
megaptera-helvetiae commented 4 years ago

Where would I add these lines to add interproscan to my path?

Into pantagruel_pipeline_04_functional_annotation.sh?

I have not used the install-dependencies.sh script much. I mostly installed everything by hand that was not installed with the initial system-wide install by the admins.

Thanks.

flass commented 4 years ago

OK. The link creation command ln -s ... only needs to be run once. For permanent effect, the PATH editing command line export PATH=... may be added to your ~/.bashrc (or ~/.profile or ~/.bash_profile depending on which one is the safe one to edit on your server)